2015-09-17 39 views
3

我有一個定期進入CSV,看起來像這樣(簡化):檢測重複,並創建總結行

Published Station   TypeFuel Price 
1/09/2015 BP Seaford  ULP   129.9 
1/09/2015 BP Seaford  Diesel  133.9 
1/09/2015 BP Seaford  Gas   156.9 
1/09/2015 Shell Newhaven ULP   139.9 
1/09/2015 Shell Newhaven Diesel  150.9 
1/09/2015 7-Eleven Malaga ULP   135.9 
1/09/2015 7-Eleven Malaga Diesel  155.9 
2/10/2015 BP Seaford  ULP   138.9 
2/10/2015 BP Seaford  Diesel  133.6 
2/10/2015 BP Seaford  Gas   157.9 

...還有更多的行隱藏。看大約200個電臺,每天報告20-30天。

我需要總結下來,看起來像這樣:

Published Station   ULP  Diesel Gas 
1/09/2015 BP Seaford  129.9 133.9 156.9 
1/09/2015 Shell Newhaven 139.9 150.9 
1/09/2015 7-Eleven Malaga 135.9 155.9 
2/09/2015 BP Seaford  138.9 133.6 157.9 

只是把與熊貓教程,也相當新的Python嬰兒的步驟,但我相信這兩個應該幫我完成這個任務。

我相信我需要遍歷CSV,當發佈和站點匹配時,創建一個新行,將ULP /柴油/天然氣價格轉換爲新列。

回答

6

您正在尋找DataFrame.pivot_table(),pivotting基礎上,列 - 'Published','Station',從列取值 - TypeFuelPrice其值在數據透視表中的新列,並使用值。示例 -

In [5]: df 
Out[5]: 
    Published   Station TypeFuel Price 
0 1/09/2015  BP Seaford  ULP 129.9 
1 1/09/2015  BP Seaford Diesel 133.9 
2 1/09/2015  BP Seaford  Gas 156.9 
3 1/09/2015 Shell Newhaven  ULP 139.9 
4 1/09/2015 Shell Newhaven Diesel 150.9 
5 1/09/2015 7-Eleven Malaga  ULP 135.9 
6 1/09/2015 7-Eleven Malaga Diesel 155.9 
7 2/10/2015  BP Seaford  ULP 138.9 
8 2/10/2015  BP Seaford Diesel 133.6 
9 2/10/2015  BP Seaford  Gas 157.9 

In [7]: df.pivot_table(index=['Published','Station'],columns=['TypeFuel'],values='Price') 
Out[7]: 
TypeFuel     Diesel Gas ULP 
Published Station 
1/09/2015 7-Eleven Malaga 155.9 NaN 135.9 
      BP Seaford  133.9 156.9 129.9 
      Shell Newhaven 150.9 NaN 139.9 
2/10/2015 BP Seaford  133.6 157.9 138.9 

如果你不想PublishedStation做的是索引,你可以打電話.reset_index()上的pivot_table()結果重置索引。示例 -

In [8]: df.pivot_table(index=['Published','Station'],columns=['TypeFuel'],values='Price').reset_index() 
Out[8]: 
TypeFuel Published   Station Diesel Gas ULP 
0   1/09/2015 7-Eleven Malaga 155.9 NaN 135.9 
1   1/09/2015  BP Seaford 133.9 156.9 129.9 
2   1/09/2015 Shell Newhaven 150.9 NaN 139.9 
3   2/10/2015  BP Seaford 133.6 157.9 138.9 
+0

如果我有,我想有後/燃油價格爲列前,可以你點我,他們應該在你的語句去一些附加字段(地址,郵編等)(這是)我認爲他們應該都是一樣的,所以在索引中是有道理的,但是如果有一些缺失的數據會很疲憊 – Simon

+1

我認爲在'index'中,因爲無論如何,如果你想在結果表中得到這些值,你將需要將它們包括在根據哪些分組發生的列中。 –