2014-03-13 16 views
1

我搜索了熊貓文檔,很不幸,我找不到答案。pandas:groupby造成的不需要的格式結果...我如何groupby()。sum()提供表格結構

從本質上講,一些數據的爭吵之後,我有數據幀

ticker_id   close_date   sector sector_index 
0   1 2014-02-28 00:00:00 Consumer Goods  31.106653 
1   1 2014-02-27 00:00:00 Consumer Goods  30.951213 
2   2 2014-02-28 00:00:00 Consumer Goods  19.846387 
3   2 2014-02-27 00:00:00 Consumer Goods  19.671747 
4   3 2014-02-28 00:00:00 Consumer Goods 1208.552000 
5   3 2014-02-27 00:00:00 Consumer Goods 1193.352000 
6   4 2014-02-28 00:00:00 Consumer Goods  9.893989 
7   4 2014-02-27 00:00:00 Consumer Goods  9.857385 
8   5 2014-02-28 00:00:00 Consumer Goods  52.196757 
9   5 2014-02-27 00:00:00 Consumer Goods  53.101520 
10   6 2014-02-28 00:00:00   Services  5.449554 
11   6 2014-02-27 00:00:00   Services  5.440019 
12   7 2014-02-28 00:00:00 Basic Materials 4149.237000 
13   7 2014-02-27 00:00:00 Basic Materials 4130.704000 

我GROUPBY

df_all2 = df_all.groupby(['close_date','sector']).sum() 
print df_all2 

RAN和結果是這個

      ticker_id sector_index 
close_date sector         
2014-02-27 Basic Materials   7 4130.704000 
      Consumer Goods   15 1306.933865 
      Services     6  5.440019 
2014-02-28 Basic Materials   7 4149.237000 
      Consumer Goods   15 1321.595786 
      Services     6  5.449554 

但在這種形式下,我無法正確上傳到MySQL。所以爲了正確地上傳到MySQL,我需要做這個和其他一些事情。

data2 = list(tuple(x) for x in df_all2.values) 

但data2沒有意義的垃圾。

爲了長話短說,我怎樣才能讓groupby給我以下結果(其中close_date全部填寫正確且列標題是表格)。

close_date sector   ticker_id sector_index 
2014-02-27 Basic Materials   7 4130.704000 
2014-02-27 Consumer Goods   15 1306.933865 
2014-02-27 Services     6  5.440019 
2014-02-28 Basic Materials   7 4149.237000 
2014-02-28 Consumer Goods   15 1321.595786 
2014-02-28 Services     6  5.449554 

此外,爲幫助社會,我應該怎麼修改標題所以,面對這個問題可以找到解決方案,也是其他熊貓的用戶?我非常感謝你的幫助。

回答

2

你必須reset_index對多指標使用to_sql *前:

In [11]: df.groupby(['close_date','sector']).sum().reset_index() 
Out[11]: 
    close_date   sector ticker_id sector_index 
0 2014-02-27 Basic Materials   7 4130.704000 
1 2014-02-27 Consumer Goods   15 1306.933865 
2 2014-02-27   Services   6  5.440019 
3 2014-02-28 Basic Materials   7 4149.237000 
4 2014-02-28 Consumer Goods   15 1321.595786 
5 2014-02-28   Services   6  5.449554 

或者您可以使用as_index =假在GROUPBY:

In [12]: df.groupby(['close_date','sector'], as_index=False).sum() 
Out[12]: 
    close_date   sector ticker_id sector_index 
0 2014-02-27 Basic Materials   7 4130.704000 
1 2014-02-27 Consumer Goods   15 1306.933865 
2 2014-02-27   Services   6  5.440019 
3 2014-02-28 Basic Materials   7 4149.237000 
4 2014-02-28 Consumer Goods   15 1321.595786 
5 2014-02-28   Services   6  5.449554 

*注:這應該從固定0.14以上,即你應該能夠保存一個MultiIndex到SQL。

請參閱How to insert pandas dataframe via mysqldb into database?

+0

非常感謝答案和「如何將大熊貓插入mysqldb」的鏈接。我無法使它工作,所以我使用pymysql軟件包。你知道大熊貓是否可以使用pymysql? – vt2424253

+0

@ vt2424253我想這裏的一些用戶已經說過了。 –

+0

p.s.如果它有幫助,不要忘記upvote/accept! –