2017-06-07 16 views
1

我有含有4年的數據csv文件,我需要組每個賽季我的數據在4年:下面就來看看我的數據:如何組4年的數據seasonly使用熊貓

timestamp,heure,lat,lon,impact,type 
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1 
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1 
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2 
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1 
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1 
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1 
.... 
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1 

,這裏是我想要的輸出:

winter  (the mean value of impacts) 
summer  (the mean value of impacts) 
autumn  .... 
spring  ..... 

,所以我期待着4行中的所有月份總結了4個賽季。 我開始如下:

data['impact'] = data['impact'].abs() 
yearly = data.groupby(data.index.month)['impact'].mean() 

任何想法??

+0

你認爲賽季整整一個月。去冬天:十二月,一月,二月? 或確切日期? 如果是這樣,春天有時候是三月的20或者三月的21, – steboc

+0

我可以認爲它是整整一個月,除非如果有,如果有方法來區分它。 –

回答

1

有了確切日期

import pandas as pd 
spring = range(80, 172) 
summer = range(172, 264) 
fall = range(264, 355) 

def season(x): 
    if x in spring: 
     return 'Spring' 
    if x in summer: 
     return 'Summer' 
    if x in fall: 
     return 'Fall' 
    else : 
     return 'Winter' 

df = pd.DataFrame({'_date' :pd.date_range(start=pd.datetime(2016,1,1), end=pd.datetime(2016,12,31), freq='D'),'impact' : range(0,366)})  

df['SEASON'] = df['_date'].dt.dayofyear.apply(lambda x : season(x)) 
df.groupby('SEASON')['impact'].mean() 
+0

謝謝,但你能解釋這一行嗎?df ['SEASON'] = df.YOURDATE.dt.dayofyear.apply(lambda x:season(x)),'df ['SEASON']是指什麼?我沒有在我的數據中。 –

+0

該語法創建一個名爲'SEASON'的新列,併爲每個記錄指定正確的季節標籤作爲值。 – bdiamante

+0

哦,我現在明白了,非常感謝你 –

2

粗糙的月份...假設時間戳在索引中。

mlist = [[12, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]] 
slist = ['winter', 'spring', 'summer', 'autum'] 
sdict = {k: v for v, ks in zip(slist, mlist) for k in ks} 

df.groupby(df.index.month.map(sdict.get)).impact.mean() 

設置

import pandas as pd 
from io import StringIO 

txt = """timestamp,heure,lat,lon,impact,type 
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1 
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1 
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2 
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1 
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1 
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1 
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1 
""" 

df = pd.read_csv(StringIO(txt), parse_dates=[0], index_col=0) 
+0

謝謝你的答案,我得到的錯誤:'numpy.ndarray'對象沒有屬性'地圖' –

+0

這是因爲你使用的是一個老版本的熊貓,其中'months'屬性不**返回一個索引對象。試試這個'df.index.to_series()。dt.month.map(sdict)'我對這種混亂表示歉意。 – piRSquared

+0

超級!我現在完美地工作了。非常感謝你 –