基於查詢的索引和datacolumns一個大熊貓據幀

我有一個Datset看起來像：基於查詢的索引和datacolumns一個大熊貓據幀

data="""cruiseid year station month day date  lat  lon   depth_w taxon      count 
     AA8704 1987 1  04  13 13-APR-87 35.85  -75.48  18  Centropages_typicus   75343 
     AA8704 1987 1  04  13 13-APR-87 35.85  -75.48  18  Gastropoda     0 
     AA8704 1987 1  04  13 13-APR-87 35.85  -75.48  18  Calanus_finmarchicus   2340 
     AA8704 1987 1  07  13 13-JUL-87 35.85  -75.48  18  Acartia_spp.     5616 
     AA8704 1987 1  07  13 13-JUL-87 35.85  -75.48  18  Metridia_lucens    468  
     AA8704 1987 1  08  13 13-AUG-87 35.85  -75.48  18  Evadne_spp.     0  
     AA8704 1987 1  08  13 13-AUG-87 35.85  -75.48  18  Salpa      0  
     AA8704 1987 1  08  13 13-AUG-87 35.85  -75.48  18  Oithona_spp.     468  
""" 
datafile = open('data.txt','w') 
datafile.write(data) 
datafile.close()

我讀了它與大熊貓：

parse = lambda x: dt.datetime.strptime(x, '%d-%m-%Y') 
df = pd.read_csv('data.txt',index_col=0, header=False, parse_dates={"Datetime" : [1,3,4]}, skipinitialspace=True, sep=' ', skiprows=0)

怎樣才能從這個數據幀的一個子集與所有的記錄在四月的分類是'Calanus_finmarchicus'或'腹足'

我可以查詢數據框的分類等於'Calanus_finmarchicus'或'腹足'使用

df[(df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')]

但我有麻煩quering的時間，在numy類似的東西可能是這樣的：

import numpy as np 
data = np.genfromtxt('data.txt', dtype=[('cruiseid','S6'), ('year','i4'), ('station','i4'), ('month','i4'), ('day','i4'), ('date','S9'), ('lat','f8'), ('lon','f8'), ('depth_w','i8'), ('taxon','S60'), ('count','i8')], skip_header=1) 
selection = [np.where((data['taxon']=='Calanus_finmarchicus') | (data['taxon']=='Gastropoda') & ((data['month']==4) | (data['month']==3)))[0]] 
data[selection]

這裏a link用筆記本重現例如

來源

2013-11-23 user1013346

我沒有注意語法（brachets命令）和dataframe.index屬性，這一行給我我正在尋找什麼：

results = df[((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')) & (df.index.month==4)] # [df.index.month==4)]

來源

2013-11-23 17:46:49 user1013346

你可以參考datetime 's month屬性：

>>> df.index.month 
array([4, 4, 4, 7, 7, 8, 8, 8], dtype=int32) 

>>> df[((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')) 
...  & (df.index.month == 4)] 

      cruiseid station  date lat lon depth_w \ 
Datetime 
1987-04-13 AA8704  1 13-APR-87 35.85 -75.48  18 
1987-04-13 AA8704  1 13-APR-87 35.85 -75.48  18 

          taxon count Unnamed: 11 
Datetime 
1987-04-13   Gastropoda  0   NaN 
1987-04-13 Calanus_finmarchicus 2340   NaN

來源

2013-11-23 17:23:48 alko

如果你有一個多列指數？你可以在過濾表達式中單獨引用列嗎？ – Mzzzzzz

使用索引的一個月屬性：

df[(df.index.month == 4) & ((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda'))]

來源

2013-11-23 17:30:17

啊，快點！我被毆打... –

正如有人說，你可以使用按月進行過濾，但我也建議使用pandas.Series.isin()檢查您taxon條件：

>>> df[df.taxon.isin(['Calanus_finmarchicus', 'Gastropoda']) & (df.index.month == 4)] 
      cruiseid station  date lat lon depth_w \ 
Datetime               
1987-04-13 AA8704  1 13-APR-87 35.85 -75.48  18 
1987-04-13 AA8704  1 13-APR-87 35.85 -75.48  18 

          taxon count Unnamed: 11 
Datetime            
1987-04-13   Gastropoda  0   NaN 
1987-04-13 Calanus_finmarchicus 2340   NaN

來源

2013-11-23 17:42:36

基於查詢的索引和datacolumns一個大熊貓據幀

回答

相關問題