熊貓：數據幀到不平衡面板

dictDF = {0：df0,1：df1,2：DF2}

每個數據幀DF0，DF1，DF2表示時間，其中第一列標識（如社會安全號碼）一人，另一列是這個人的特點，如

DataFrame df0 

id Name  Age Gender Job  Income 
10 Daniel 40 Male Scientist 100 
5 Anna  39 Female Doctor  250 

DataFrame df1 

id Name  Age Gender Job  Income 
67 Guto  35 Male Engineer 100 
7 Anna  39 Female Doctor  300 
9 Melissa 26 Female Student 36 

DataFrame df2 

id Name  Age Gender Job  Income 
77 Patricia 30 Female Dentist 300 
9 Melissa  27 Female Dentist 250

爲特定日期的表的ID（社會安全號碼）確切地標識該人。例如，相同的「Melissa」出現在兩個不同的DataFrame中。但是，有兩種不同的「Annas」。

在這些數據框中，人員和人員的數量隨時間而變化。有些人在所有日期都有代表，其他人只在特定日期代表。

有沒有一種簡單的方法來轉換（非平衡）Panel對象中的數據框字典，其中id在所有日期出現，並且如果數據給定的id不可用，它將被NaN替換？

當然，我可以這樣做，製作一個所有id的列表，然後檢查每個日期是否有給定的id。如果它被表示，那麼我複製數據。否則，我只寫NaN。

我不知道是否有一個簡單的方法使用熊貓工具。

來源

2016-02-12 DanielTheRocketMan

我會推薦使用MultiIndex而不是面板。

首先，期間添加到每個數據幀：

for n, df in dictDF.iteritems(): 
    df['period'] = n

然後連接成一個大的數據幀：

big_df = pd.concat([df for df in dictDF.itervalues()], ignore_index=True)

現在你們指數period和id，你都保證有一個獨特的index：

>>> big_df.set_index(['period', 'id']) 
       Name Age Gender  Job Income 
period id           
0  10 Daniel 40 Male Scientist  100 
     5  Anna 39 Female  Doctor  250 
1  67  Guto 35 Male Engineer  100 
     7  Anna 39 Female  Doctor  300 
     9 Melissa 26 Female Student  36 
2  77 Patricia 30 Female Dentist  300 
     9 Melissa 27 Female Dentist  250

你也可以反向順序：

>>> big_df.set_index(['id', 'period']).sort_index() 
       Name Age Gender  Job Income 
id period           
5 0   Anna 39 Female  Doctor  250 
7 1   Anna 39 Female  Doctor  300 
9 1  Melissa 26 Female Student  36 
    2  Melissa 27 Female Dentist  250 
10 0   Daniel 40 Male Scientist  100 
67 1   Guto 35 Male Engineer  100 
77 2  Patricia 30 Female Dentist  300

你甚至可以拆散的數據很容易：

big_df.set_index(['id', 'period'])[['Income']].unstack('period') 
     Income   
period  0 1 2 
id      
5   250 NaN NaN 
7   NaN 300 NaN 
9   NaN 36 250 
10   100 NaN NaN 
67   NaN 100 NaN 
77   NaN NaN 300

來源

2016-02-12 02:57:22 Alexander

熊貓：數據幀到不平衡面板

回答

相關問題