2013-11-23 118 views
4

我有一個Datset看起來像:基於查詢的索引和datacolumns一個大熊貓據幀

data="""cruiseid year station month day date  lat  lon   depth_w taxon      count 
     AA8704 1987 1  04  13 13-APR-87 35.85  -75.48  18  Centropages_typicus   75343 
     AA8704 1987 1  04  13 13-APR-87 35.85  -75.48  18  Gastropoda     0 
     AA8704 1987 1  04  13 13-APR-87 35.85  -75.48  18  Calanus_finmarchicus   2340 
     AA8704 1987 1  07  13 13-JUL-87 35.85  -75.48  18  Acartia_spp.     5616 
     AA8704 1987 1  07  13 13-JUL-87 35.85  -75.48  18  Metridia_lucens    468  
     AA8704 1987 1  08  13 13-AUG-87 35.85  -75.48  18  Evadne_spp.     0  
     AA8704 1987 1  08  13 13-AUG-87 35.85  -75.48  18  Salpa      0  
     AA8704 1987 1  08  13 13-AUG-87 35.85  -75.48  18  Oithona_spp.     468  
""" 
datafile = open('data.txt','w') 
datafile.write(data) 
datafile.close() 

我讀了它與大熊貓:

parse = lambda x: dt.datetime.strptime(x, '%d-%m-%Y') 
df = pd.read_csv('data.txt',index_col=0, header=False, parse_dates={"Datetime" : [1,3,4]}, skipinitialspace=True, sep=' ', skiprows=0) 

怎樣才能從這個數據幀的一個子集與所有的記錄在四月的分類是'Calanus_finmarchicus'或'腹足'

我可以查詢數據框的分類等於'Calanus_finmarchicus'或'腹足'使用

df[(df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')] 

但我有麻煩quering的時間,在numy類似的東西可能是這樣的:

import numpy as np 
data = np.genfromtxt('data.txt', dtype=[('cruiseid','S6'), ('year','i4'), ('station','i4'), ('month','i4'), ('day','i4'), ('date','S9'), ('lat','f8'), ('lon','f8'), ('depth_w','i8'), ('taxon','S60'), ('count','i8')], skip_header=1) 
selection = [np.where((data['taxon']=='Calanus_finmarchicus') | (data['taxon']=='Gastropoda') & ((data['month']==4) | (data['month']==3)))[0]] 
data[selection] 

這裏a link用筆記本重現例如

回答

0

我沒有注意語法(brachets命令)和dataframe.index屬性,這一行給我我正在尋找什麼:

results = df[((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')) & (df.index.month==4)] # [df.index.month==4)] 
5

你可以參考datetime 's month屬性:

>>> df.index.month 
array([4, 4, 4, 7, 7, 8, 8, 8], dtype=int32) 

>>> df[((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')) 
...  & (df.index.month == 4)] 

      cruiseid station  date lat lon depth_w \ 
Datetime 
1987-04-13 AA8704  1 13-APR-87 35.85 -75.48  18 
1987-04-13 AA8704  1 13-APR-87 35.85 -75.48  18 

          taxon count Unnamed: 11 
Datetime 
1987-04-13   Gastropoda  0   NaN 
1987-04-13 Calanus_finmarchicus 2340   NaN 
+0

如果你有一個多列指數?你可以在過濾表達式中單獨引用列嗎? – Mzzzzzz

1

使用索引的一個月屬性:

df[(df.index.month == 4) & ((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda'))] 
+0

啊,快點!我被毆打... –

2

正如有人說,你可以使用​​按月進行過濾,但我也建議使用pandas.Series.isin()檢查您taxon條件:

>>> df[df.taxon.isin(['Calanus_finmarchicus', 'Gastropoda']) & (df.index.month == 4)] 
      cruiseid station  date lat lon depth_w \ 
Datetime               
1987-04-13 AA8704  1 13-APR-87 35.85 -75.48  18 
1987-04-13 AA8704  1 13-APR-87 35.85 -75.48  18 

          taxon count Unnamed: 11 
Datetime            
1987-04-13   Gastropoda  0   NaN 
1987-04-13 Calanus_finmarchicus 2340   NaN