這裏有幾個選項,你可以選擇:以上
import numpy as np
import pandas as pd
index = ['x', 'y']
columns = ['a','b','c']
# Option 1: Set the column names in the structured array's dtype
dtype = [('a','int32'), ('b','float32'), ('c','float32')]
values = np.zeros(2, dtype=dtype)
df = pd.DataFrame(values, index=index)
# Option 2: Alter the structured array's column names after it has been created
values = np.zeros(2, dtype='int32, float32, float32')
values.dtype.names = columns
df2 = pd.DataFrame(values, index=index, columns=columns)
# Option 3: Alter the DataFrame's column names after it has been created
values = np.zeros(2, dtype='int32, float32, float32')
df3 = pd.DataFrame(values, index=index)
df3.columns = columns
# Option 4: Use a dict of arrays, each of the right dtype:
df4 = pd.DataFrame(
{'a': np.zeros(2, dtype='int32'),
'b': np.zeros(2, dtype='float32'),
'c': np.zeros(2, dtype='float32')}, index=index, columns=columns)
# Option 5: Concatenate DataFrames of the simple dtypes:
df5 = pd.concat([
pd.DataFrame(np.zeros((2,), dtype='int32'), columns=['a']),
pd.DataFrame(np.zeros((2,2), dtype='float32'), columns=['b','c'])], axis=1)
# Option 6: Alter the dtypes after the DataFrame has been formed. (This is not very efficient)
values2 = np.zeros((2, 3))
df6 = pd.DataFrame(values2, index=index, columns=columns)
for col, dtype in zip(df6.columns, 'int32 float32 float32'.split()):
df6[col] = df6[col].astype(dtype)
每個選項產生相同的結果
a b c
x 0 0 0
y 0 0 0
與dtypes:
a int32
b float32
c float32
dtype: object
爲什麼pd.DataFrame(values, index=index, columns=columns)
生產用的NaN一個數據幀:
values
是一個結構數組列名f0
,f1
,f2
:
In [171]: values
Out[172]:
array([(0, 0.0, 0.0), (0, 0.0, 0.0)],
dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '<f4')])
如果您傳遞參數columns=['a', 'b', 'c']
到pd.DataFrame
,那麼熊貓會尋找與這些列結構化數組values
中的名稱。當找不到那些列時,Pandas會在DataFrame中放置NaN
以表示缺失值。
很高興知道它的工作原理,因此我們不只是複製和粘貼解決方案。謝謝! – rocarvaj
@rocarvaj:你覺得什麼是需要expalnation? – unutbu
何時使用標準DataFrame構造函數以及何時使用from_records。 – rocarvaj