0
我有些「堆疊」或「記錄格式」的數據,看起來像這樣(來自數據庫):有多個索引的「疊加」數據?
"recid","code","value","exam_num"
"101703034","k_rat1","17/18","1"
"200907062","e_mas1","AC YES","6"
"203004134","k_rat1","5/18","5"
"303505091","k_gtrdsc","Foo","1"
"303505091","k_rat1","4/18","2"
,我想轉動它看起來像這樣:
recid,exam_num,k_rat1,e_mas1,k_gtrdsc
101703034,1,"17/18",,
200907062,6,,"AC YES",
203004134,5,"5/18",,
303505091,1,,,Foo
303505091,2,"4.18",,
我可以讓它只用一個指數(recid)這樣的工作:
my_df = read_csv("data.csv")
pivoted = my_df.pivot(index="recid",columns="code",values="value")
這給了我這個(注意失蹤exam_num列):
recid,e_mas1,k_gtrdsc,k_rat1
101703034,,,17/18
200907062,AC YES,,
203004134,,,5/18
303505091,,Foo,4/18
但是,當我嘗試指定多個索引或大多數其他任何東西時,我會得到各種錯誤。我讀過這個:http://pandas.pydata.org/pandas-docs/stable/reshaping.html,但看不到一種方法來完成我以後的事情。
幫助將不勝感激!
對不起,這我不清楚,應該怎樣設置我的多級索引。你能詳細說明嗎?當我在read_csv後運行你的代碼時,它似乎沒有做任何事情。 – MGeary
將結果賦給一個變量並看看結果 – Boud
我不是很清楚。我將結果輸出到CSV文件,它與輸入的CSV文件基本相同: 'my_df = read_csv(「data.csv」)' 'my_df.set_index(['recid','exam_num', 'code'])。unstack('code')' 'my_df.to_csv(「out.csv」)' – MGeary