Python的 - 每月彙總並計算平均

我有一個CSV文件看起來像這樣：Python的 - 每月彙總並計算平均

Date,Sentiment 
2014-01-03,0.4 
2014-01-04,-0.03 
2014-01-09,0.0 
2014-01-10,0.07 
2014-01-12,0.0 
2014-02-24,0.0 
2014-02-25,0.0 
2014-02-25,0.0 
2014-02-26,0.0 
2014-02-28,0.0 
2014-03-01,0.1 
2014-03-02,-0.5 
2014-03-03,0.0 
2014-03-08,-0.06 
2014-03-11,-0.13 
2014-03-22,0.0 
2014-03-23,0.33 
2014-03-23,0.3 
2014-03-25,-0.14 
2014-03-28,-0.25 
etc

我的目標是月彙總的日期和計算月平均。日期可能不會以1或1月開始。問題是我有很多數據，這意味着我有更多年。爲此，我想找到最快的日期（月），並從那裏開始計算月份和平均值。例如：

Month count, average 
1, 0.4 (<= the earliest month) 
2, -0.3 
3, 0.0 
... 
12, 0.1 
13, -0.4 (<= new year but counting of month is continuing) 
14, 0.3

我用熊貓來打開CSV

data = pd.read_csv("pks.csv", sep=",")

所以在data['Date']我有日期和data['Sentiment']我有值。任何想法如何做到這一點？

來源

2014-05-25 Jaroslav Klimčík

可能最簡單的方法是使用resample命令。首先，在閱讀數據時，確保解析日期並將日期列設置爲索引（忽略StringIO部分和標題= True ...我正在從多行字符串讀取示例數據）：

>>> df = pd.read_csv(StringIO(data),header=True,parse_dates=['Date'], 
        index_col='Date') 
>>> df 

      Sentiment 
Date 
2014-01-03  0.40 
2014-01-04  -0.03 
2014-01-09  0.00 
2014-01-10  0.07 
2014-01-12  0.00 
2014-02-24  0.00 
2014-02-25  0.00 
2014-02-25  0.00 
2014-02-26  0.00 
2014-02-28  0.00 
2014-03-01  0.10 
2014-03-02  -0.50 
2014-03-03  0.00 
2014-03-08  -0.06 
2014-03-11  -0.13 
2014-03-22  0.00 
2014-03-23  0.33 
2014-03-23  0.30 
2014-03-25  -0.14 
2014-03-28  -0.25 


>>> df.resample('M',how='mean') 

      Sentiment 
2014-01-31  0.088 
2014-02-28  0.000 
2014-03-31  -0.035

如果你想一個月櫃檯，以後還可以將其添加您的resample：

>>> agg = df.resample('M',how='mean') 
>>> agg['cnt'] = range(len(agg)) 
>>> agg 

      Sentiment cnt 
2014-01-31  0.088 0 
2014-02-28  0.000 1 
2014-03-31  -0.035 2

您也可以用groupby方法和TimeGrouper功能（組由一個月，然後調用做到這一點平均便利方法，可用於groupby）。

>>> df.groupby(pd.TimeGrouper(freq='M')).mean() 

      Sentiment 
2014-01-31  0.088 
2014-02-28  0.000 
2014-03-31  -0.035

來源

2014-05-25 21:15:48

太棒了，那正是我需要的。非常感謝你！ –

Python的 - 每月彙總並計算平均

回答

相關問題