2017-09-21 129 views
2

我有dataframes列表添加缺少的列的數據幀

DFA:

item a  A    
A  1  2 
B  1  3   
C  0  4   

DFB:

item a  B 
E  1  2 
F  0  6 

DFC:

item a  C 
G  1  3 
H  0  4 

我要添加每個數據幀缺失的列

這就是我想要的東西: DFA:

item a  A B C   
A  1  2 0 0 
B  1  3 0 0 
C  0  4 0 0 

DFB:

item a  A B C 
E  1  0 2 0 
F  0  0 6 0 

DFC:

item a  A B C 
G  1  0 0 3 
H  0  0 0 4 
+0

https://開頭計算器。com/questions/39050539 /添加多列到熊貓 - 同時希望鏈接將有所幫助 – Wen

回答

3

您可以創建一個組合列列表這樣

col_list = (df1.append([df2,df3])).columns.tolist() 

現在列添加到每個數據幀使用mergereduce操作定義的dfA, dfB, dfC順序

df1 = df1.loc[:, col_list].fillna(0) 
print(df1) 

    A B C a item 
0 2 0.0 0.0 1 A 
1 3 0.0 0.0 1 B 
2 4 0.0 0.0 0 C 


df2 = df2.loc[:, col_list].fillna(0) 
print(df2) 

    A B C a item 
0 0.0 2 0.0 1 E 
1 0.0 6 0.0 0 F 

df3 = df3.loc[:, col_list].fillna(0) 
print(df3) 

    A B C a item 
0 0.0 0.0 3 1 G 
1 0.0 0.0 4 0 H 
2

的一種方式。

In [1932]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'), 
           [dfA, dfB, dfC]).fillna(0) 
Out[1932]: 
    item a A B C 
0 A 1 2 0.0 0.0 
1 B 1 3 0.0 0.0 
2 C 0 4 0.0 0.0 

In [1933]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'), 
        [dfB, dfA, dfC]).fillna(0) 
Out[1933]: 
    item a B A C 
0 E 1 2 0.0 0.0 
1 F 0 6 0.0 0.0 

In [1934]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'), 
        [dfC, dfA, dfB]).fillna(0) 
Out[1934]: 
    item a C A B 
0 G 1 3 0.0 0.0 
1 H 0 4 0.0 0.0 
2

選項1
對齊兩個軸
隨着functools.partial

from functool import partial 

(_, dfA), (dfC, dfB) = list(map(
    partial(dfC.align, fill_value=0), 
    dfA.align(dfB, fill_value=0) 
)) 

方案1B
對齊列僅

from functools import partial 

(_, dfA), (dfC, dfB) = list(map(
    partial(dfC.align, fill_value=0, axis=1), 
    dfA.align(dfB, fill_value=0, axis=1) 
)) 

選項2
對齊兩個軸
隨着pd.DataFrame.reindex

from functools import reduce  

lod = [dfA, dfB, dfC] 
idx = reduce(pd.Index.union, (d.index for d in lod)) 
col = reduce(pd.Index.union, (d.columns for d in lod)) 
dfA, dfB, dfC = (d.reindex(idx, col, fill_value=0) for d in lod) 

選項2B
對齊列僅

lod = [dfA, dfB, dfC] 
col = reduce(pd.Index.union, (d.columns for d in lod)) 
dfA, dfB, dfC = (d.reindex(columns=col, fill_value=0) for d in lod) 

設置

dfA = pd.DataFrame(**{ 
    'columns': ['item', 'a', 'A'], 
    'data': [['A', 1, 2], ['B', 1, 3], ['C', 0, 4]], 
    'index': [0, 1, 2]}) 

dfB = pd.DataFrame(**{ 
    'columns': ['item', 'a', 'B'], 
    'data': [['E', 1, 2], ['F', 0, 6]], 
    'index': [0, 1]}) 

dfC = pd.DataFrame(**{ 
    'columns': ['item', 'a', 'C'], 
    'data': [['G', 1, 3], ['H', 0, 4]], 
    'index': [0, 1]})