2014-09-30 18 views
0

親愛的驚人的黑客,Python的熊貓數據幀如何以轉動世界的

我是個新手,而不能找出哪些蟒蛇/熊貓功能可以實現「轉型」我想要的。向你展示我擁有的東西(「原創」)和我想要的結果(「所需」)比長篇描述(我認爲和希望)要好。

import pandas as pd 

原始數據幀輸入

df_orig = pd.DataFrame() 
df_orig["Treatment"] = ["C", "C", "D", "D", "C", "C", "D", "D"] 
df_orig["TimePoint"] = [24, 48, 24, 48, 24, 48, 24, 48] 
df_orig["AN"] = ["ALF234","ALF234","ALF234","ALF234","XYK987","XYK987","XYK987","XYK987"] 
df_orig["Bincode"] = [33,33,33,33,44,44,44,44] 
df_orig["BC_all"] = ["33.7","33.7","33.7","33.7","44.9","44.9","44.9","44.9"] 
df_orig["RIA_avg"] = [0.202562419159333,0.281521224788666, 0.182828319454333,0.294909088002333, 
        0.105941322218833,0.247949961707,0.1267545610749,0.159711714967666] 
df_orig["sum14N_avg"] = [4120031.79121666,3742633.37033333,4659315.47073666,4345668.76408666, 
        26307312.1188333,24089229.9177999,35367286.7322666,34093045.3129] 

顯示原始數據幀

enter image description here

所需數據幀輸入,

df_wanted = pd.DataFrame() 
df_wanted["AN"] = ["ALF234","XYK987"] 
df_wanted["Bincode"] = [33,44] 
df_wanted["BC_all"] = ["33.7","44.9"] 
df_wanted["C_24_RIA_avg"] = [0.202562419159333, 0.105941322218833] 
df_wanted["C_48_RIA_avg"] = [0.281521224788666,0.247949961707] 
df_wanted["D_24_RIA_avg"] = [0.182828319454333,0.1267545610749] 
df_wanted["D_48_RIA_avg"] = [0.294909088002333, 0.159711714967666] 
df_wanted["C_24_sum14N_avg"] = [4120031.791, 26307312.12] 
df_wanted["C_48_sum14N_avg"] = [3742633.37, 24089229.92] 
df_wanted["D_24_sum14N_avg"] = [4659315.471, 35367286.73] 
df_wanted["D_48_sum14N_avg"] = [4345668.764, 34093045.31] 

顯示所需的數據幀

enter image description here

非常感謝您的支持!

回答

2

我相信你想要使用pd.pivot_table來支點。請參閱the examples on pivot tables以更好地瞭解其工作原理。

以下應該給你你想要的。

df_wanted = pd.pivot_table(
    df_orig, 
    index=['AN', 'Bincode', 'BC_all'], 
    columns=['Treatment', 'Timepoint'], 
    values=['RIA_avg', 'sum14N_avg'] 
) 

注意,列名不能準確地轉化爲你在你的輸出說明,而是將有兩個列和行分層指數,這應該是一起工作更方便。

獲取行/列/值從這種格式了,可以通過使用.loc

df_wanted.loc['XYK987', :] 
df_wanted.loc[:, ('sum14N_avg')] 
df_wanted.loc['ALF234', ('RIA_avg', 'C', 24)] 
+0

非常感謝你! – tryptofame 2014-09-30 19:09:22

0

你的輸出沒有正確對齊,所以這很難遵循。但它看起來像df.groupby('AN').mean()或類似的工作。閱讀Group By上的文檔。