我有一個Datset看起來像:基於查詢的索引和datacolumns一個大熊貓據幀
data="""cruiseid year station month day date lat lon depth_w taxon count
AA8704 1987 1 04 13 13-APR-87 35.85 -75.48 18 Centropages_typicus 75343
AA8704 1987 1 04 13 13-APR-87 35.85 -75.48 18 Gastropoda 0
AA8704 1987 1 04 13 13-APR-87 35.85 -75.48 18 Calanus_finmarchicus 2340
AA8704 1987 1 07 13 13-JUL-87 35.85 -75.48 18 Acartia_spp. 5616
AA8704 1987 1 07 13 13-JUL-87 35.85 -75.48 18 Metridia_lucens 468
AA8704 1987 1 08 13 13-AUG-87 35.85 -75.48 18 Evadne_spp. 0
AA8704 1987 1 08 13 13-AUG-87 35.85 -75.48 18 Salpa 0
AA8704 1987 1 08 13 13-AUG-87 35.85 -75.48 18 Oithona_spp. 468
"""
datafile = open('data.txt','w')
datafile.write(data)
datafile.close()
我讀了它與大熊貓:
parse = lambda x: dt.datetime.strptime(x, '%d-%m-%Y')
df = pd.read_csv('data.txt',index_col=0, header=False, parse_dates={"Datetime" : [1,3,4]}, skipinitialspace=True, sep=' ', skiprows=0)
怎樣才能從這個數據幀的一個子集與所有的記錄在四月的分類是'Calanus_finmarchicus'或'腹足'
我可以查詢數據框的分類等於'Calanus_finmarchicus'或'腹足'使用
df[(df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')]
但我有麻煩quering的時間,在numy類似的東西可能是這樣的:
import numpy as np
data = np.genfromtxt('data.txt', dtype=[('cruiseid','S6'), ('year','i4'), ('station','i4'), ('month','i4'), ('day','i4'), ('date','S9'), ('lat','f8'), ('lon','f8'), ('depth_w','i8'), ('taxon','S60'), ('count','i8')], skip_header=1)
selection = [np.where((data['taxon']=='Calanus_finmarchicus') | (data['taxon']=='Gastropoda') & ((data['month']==4) | (data['month']==3)))[0]]
data[selection]
這裏a link用筆記本重現例如
如果你有一個多列指數?你可以在過濾表達式中單獨引用列嗎? – Mzzzzzz