2017-04-10 99 views
1

我有大量的熊貓數據框,具有完全相同的鍵和列名稱。他們有數據如下:總結大量的數據幀

z1.ix[0] 
val1  [1, 5, 3, 4] 
val2  47 
Name: 2017-01-01 01:00:00, dtype: object 

z2.ix[0] 
val1  [11, 5, 53, 5] 
val2  4 
Name: 2017-01-01 01:00:00, dtype: object 

z3.ix[0] 
val1  [1, 25, 3, 4] 
val2  7 
Name: 2017-01-01 01:00:00, dtype: object 

我試過如下:

summedDf = z1 + z2 + z3 

這給了以下內容:

summedDf.ix[0] 
val1  [1, 5, 3, 4, 11, 5, 53, 5, 1, 25, 3, 4] 
val2  58 
Name: 2017-01-01 01:00:00, dtype: object 

但是我希望能實現,而不是以下:

summedDf.ix[0] 
val1  [13, 35, 59, 13] 
val2  58 
Name: 2017-01-01 01:00:00, dtype: object 

另外,如何我是否將上述添加擴展到約500個數據框?

編輯: val1val2是不同的列名稱。 val1商店列表和val2存儲每個索引的值。

+0

我想你可以連接成一個'df'然後沿軸線使用df.sum。 – Divakar

+0

這些列表是否存儲在列中?或者對每個* val1 *項目執行* val2 *重複操作?請顯示全畫幅而不是切片。 – Parfait

回答

0

可能不是最有效的,但將讓你開始:

import pandas as pd 
import numpy as np 


# gen test data 
df1 = pd.DataFrame({'val1':[[1,2,3],[4,5,6]], 'val2': [1,2]}) 
df1 

給人,

val1  val2 
0 [1, 2, 3] 1 
1 [4, 5, 6] 2 

另一個數據框:

def check(x): 
    if isinstance(x, list): 
     output = [i * 2 for i in x] 
    else: 
     output = x*2 
    return output 

df2 = df1.applymap(lambda x: check(x)) 
df2 

給人,

val1  val2 
0 [2, 4, 6] 2 
1 [8, 10, 12] 4 

添加數據幀:

def add_cols(df1, df2, col): 
    if isinstance(df1[col][0], list): 
     df1[col] = df1[col].apply(lambda x: np.array(x)) 
     df2[col] = df2[col].apply(lambda x: np.array(x)) 
    return df1[col].add(df2[col]) 


def add_dfs(df1, df2): 
    for c in df1.columns: 
     df1.loc[:,c] = add_cols(df1, df2, c) 
    return df1 


# you can use a generator to read dataframes on the fly 
# instead of loading all into a list 
dfs = [df1, df2] 


for e, df in enumerate(dfs): 
    if e == 0: 
     df_sum = df.copy() 
    else: 
     df_sum = add_dfs(df1, df2) 

給出所需的輸出:

val1   val2 
0 [5, 10, 15]  5 
1 [20, 25, 30] 10