老规矩,talk is cheap, show me the code.
Ⅰ. What
1.1 numpy.ndarray
- array object :数组中的数据;
- data-type object :数据的元数据信息。
- 多维度的 multidimensional
- 同数据类型 homogeneous
- 大小固定 fixed-size
An array object represents a multidimensional, homogeneous array of fixed-size items. An associated data-type object describes the format of each element in the array (its byte-order, how many bytes it occupies in memory, whether it is an integer, a floating point number, or something else, etc.)
class numpy.ndarray(shape, dtype=float, buffer=None, offset=0, strides=None, order=None)
import numpy as np# nda = np.array(range(12)).reshape(3, -1) # 和下面的效果相同
nda = np.arange(12).reshape(3, -1)
nda[1,1]==nda[1][1] # True
>>> nda
array([[ 0, 1, 2, 3],[ 4, 5, 6, 7],[ 8, 9, 10, 11]])>>> type(nda)
Out[73]: numpy.ndarray>>> nda.shape
Out[74]: (3, 4)>>> nda.dtype
Out[75]: dtype('int32')
1.2 pandas.Series
具有轴标签的一维数组(One-dimensional ndarray with axis labels (including time series).),但是这里的数据类型可以不一致。
官网介绍 pandas.Series
One-dimensional ndarray with axis labels (including time series).
Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).
class pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
data array-like, Iterable, dict, or scalar value
Contains data stored in Series. If data is a dict, argument order is maintained.
index array-like or Index (1d)
Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If data is dict-like and index is None, then the keys in the data are used as the index. If the index is not None, the resulting Series is reindexed with the index values.
dtype str, numpy.dtype, or ExtensionDtype, optional
Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages.
name str, optional
The name to give to the Series.
copy bool, default False
Copy input data.
d1 = {
'a': 1, 'b': 2, 'c': 3}
# d1 = {'a': 1, 'b': 2, 'c': 'hello'} # 数据类型可以不一致,一般不推荐
ser1 = pd.Series(data=d1, index=['a', 'b', 'c', 'd'])d2 = [['python', 10, 99, 'male'],['java', 14, 92, 'female'],['c', 18, 97, 'male'],['go', 22, 90, 'female']]
ser2 = pd.Series(data=d2, index=['lst', '2nd', '3rd', '4th'])
>>> ser1
a 1.0
b 2.0
c 3.0
d NaN
dtype: float64
>>> ser2
lst [python, 10, 99, male]
2nd [java, 14, 92, female]
3rd [c, 18, 97, male]
4th [go, 22, 90, female]
dtype: object
>>> type(ser1)
Out[92]: pandas.core.series.Series
>>> type(ser2)
Out[92]: pandas.core.series.Series
>>> ser1[1]
out[1]: 2.0
1.3 pandas.DataFrame
官网介绍 pandas.DataFrame
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
data ndarray (structured or homogeneous), Iterable, dict, or DataFrame
Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order.Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order.
index Index or array-like
Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.
columns Index or array-like
Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
dtype dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer.
copy bool, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input.
d = {
'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
>>> dfcol1 col2
0 1 3
1 2 4d2 = [['python', 10, 99, 'male'],['java', 14, 92, 'female'],['c', 18, 97, 'male'],['go', 22, 90, 'female']]
df = pd.DataFrame(data=d2, columns=['lang', 'age', 'popular', 'sex'], index=['lst', '2nd', '3rd', '4th'])
>>> df
Out[110]: lang age popular sex
lst python 10 99 male
2nd java 14 92 female
3rd c 18 97 male
4th go 22 90 female
Ⅱ. How
or pd.DataFrame()
# ndarray => Series
npa = np.arange(12)
ser = pd.Series(npa)
# Series => ndarray
npa_s = np.array(ser)# ndarray => DataFrame
npa2 = npa.reshape(3, -1)
df = pd.DataFrame(npa2)
# DataFrame => ndarray
npa_d = np.array(df)
npa_v = df.values # npa_d npa_v 一样# DataFrame -> Series
type(df[0]) # pandas.core.series.Series
# Series -> DataFrame