2015-12-01 205 views
1

假設我有以下數據框,其中有2個不同季節的2個不同年份和3個不同位置的某些變量。目前的數據結構是每行都是季節/位置組合,並且每年都有計數列。它看起來是這樣的:重塑大熊貓數據框

>>> df=pd.DataFrame([['Summer', 'A', 1, 2], 
        ['Winter', 'A', 3, 4], 
        ['Summer', 'B', 5, 6], 
        ['Winter', 'B', 7, 8], 
        ['Summer', 'C', 9, 10], 
        ['Winter', 'C', 11, 12]], 
        columns=['Season', 'Location', 'Count_2014', 'Count_2015']) 
>>> df 
    Season Location Count_2014 Count_2015 
0 Summer  A   1   2 
1 Winter  A   3   4 
2 Summer  B   5   6 
3 Winter  B   7   8 
4 Summer  C   9   10 
5 Winter  C   11   12 

我想重組數據,使我有一排每個季節,地點和年份組合(這意味着我將有2×3×2 = 12行)。我目前的做法當然不是最有效的(見下文)。有關重構此數據集的最佳方式的任何建議?

df.set_index(['Season', 'Location'], inplace=True) 
ListOfDFs = [] 
for Year in [x[-4:] for x in df.columns]: 
    SubD = df[['Count_' + Year]] 
    SubD.columns = ['Count'] 
    SubD['Year'] = Year 
    SubD.set_index('Year', append=True, inplace=True) 
    ListOfDFs.append(SubD) 

df2=pd.concat(ListOfDFs) 
>>> df2 
         Count 
Season Location Year  
Summer A  2014  1 
Winter A  2014  3 
Summer B  2014  5 
Winter B  2014  7 
Summer C  2014  9 
Winter C  2014  11 
Summer A  2015  2 
Winter A  2015  4 
Summer B  2015  6 
Winter B  2015  8 
Summer C  2015  10 
Winter C  2015  12 

回答

4

您正在尋找melt functionality,這將允許你這樣做的基本上都是一條線:

df_new = pd.melt(df,id_vars=['Season', 'Location'], value_vars=['Count_2014', 'Count_2015'], 
     var_name='Year', 
     value_name='Count') 

然後你可以使用apply(或有可能是更好的東西),以獲取輸出你有以上:

df_new['Year'] = df_new['Year'].apply(lambda x: x[-4:]) 

輸出:

Season Location Year Count 
0 Summer  A 2014  1 
1 Winter  A 2014  3 
2 Summer  B 2014  5 
3 Winter  B 2014  7 
4 Summer  C 2014  9 
5 Winter  C 2014  11 
6 Summer  A 2015  2 
7 Winter  A 2015  4 
8 Summer  B 2015  6 
9 Winter  B 2015  8 
10 Summer  C 2015  10 
11 Winter  C 2015  12 
+0

,而不是'df_new [ '年'] = df_new [ '年']申請(拉姆達X:X ('_')。str.get(1)'' –

0

而作爲另一種選擇,它看起來像堆()也幹得不錯。

>>> df=pd.DataFrame([['Summer','A',1,2],['Winter','A',3,4],['Summer','B',5,6],['Winter','B',7,8],['Summer','C',9,10],['Winter','C',11,12]], columns=['Season','Location','Count_2014','Count_2015']) 
>>> 
>>> df.set_index(['Season','Location'], inplace=True) 
>>> df.columns=pd.MultiIndex.from_tuples([(col[-4:],col[:-5]) for col in df.columns], names=['Year','Count']) 
>>> df=df.stack(level=0) 
>>> df 
Count     Count 
Season Location Year  
Summer A  2014  1 
       2015  2 
Winter A  2014  3 
       2015  4 
Summer B  2014  5 
       2015  6 
Winter B  2014  7 
       2015  8 
Summer C  2014  9 
       2015  10 
Winter C  2014  11 
       2015  12 
>>>