在獲得一系列具有最高交貨次數的日期後,如何才能過濾掉原始數據框的那幾天?考慮到這兩個:python熊貓通過另一系列,多列篩選數據框
most_liquid_contracts.head(20)
Out[32]:
2007-04-26 706
2007-04-27 706
2007-04-29 706
2007-04-30 706
2007-05-01 706
2007-05-02 706
2007-05-03 706
2007-05-04 706
2007-05-06 706
2007-05-07 706
2007-05-08 706
2007-05-09 706
2007-05-10 706
2007-05-11 706
2007-05-13 706
2007-05-14 706
2007-05-15 706
2007-05-16 706
2007-05-17 706
2007-05-18 706
dtype: int64
df.head(20).to_string
Out[40]:
<bound method DataFrame.to_string of
delivery volume
2007-04-27 11:55:00+01:00 705 1
2007-04-27 13:46:00+01:00 705 1
2007-04-27 14:15:00+01:00 705 1
2007-04-27 14:33:00+01:00 705 1
2007-04-27 14:35:00+01:00 705 1
2007-04-27 17:05:00+01:00 705 16
2007-04-27 17:07:00+01:00 705 1
2007-04-27 17:12:00+01:00 705 1
2007-04-27 17:46:00+01:00 705 1
2007-04-27 18:25:00+01:00 705 2
2007-04-26 23:00:00+01:00 706 10
2007-04-26 23:01:00+01:00 706 12
2007-04-26 23:02:00+01:00 706 1
2007-04-26 23:05:00+01:00 706 21
2007-04-26 23:06:00+01:00 706 10
2007-04-26 23:07:00+01:00 706 19
2007-04-26 23:08:00+01:00 706 1
2007-04-26 23:13:00+01:00 706 10
2007-04-26 23:14:00+01:00 706 62
2007-04-26 23:15:00+01:00 706 3>
我已經試過:
liquid = df[df.index.date==most_liquid_contracts.index & df['delivery']==most_liquid_contracts]
或許我需要一個合併?這似乎不太優雅,我也不敢肯定。我已經試過:
# ATTEMPT 1
most_liquid_contracts.index = pd.to_datetime(most_liquid_contracts.index, unit='d')
df['days'] = pd.to_datetime(df.index.date, unit='d')
mlc = most_liquid_contracts.to_frame(name='delivery')
mlc['days'] = mlc.index.date
data = pd.merge(mlc, df, on=['delivery', 'days'], left_index=True)
# ATTEMPT 2
liquid = pd.merge(mlc, df, on='delivery', how='inner', left_index=True)
# this gets me closer (ie. retains granularity), but somehow seems to be an outer join? it includes the union but not the intersection. this should be a subset of df, but instead has about x50 the rows, at around 195B. df originally has 4B
但我似乎無法留住分鐘級別的粒度,我需要在原有的「東風」。基本上,我只需要「df」只用於最流動的合約(來自most_liquid_contracts系列;例如,4月27日只包括「706」標籤的合約,4月29日只有「706」標籤的合同)。然後第二個DF完全相反:所有其他合同的DF(即不是最流動)。
更新:更多information--
我已經嘗試使用'組合= df.join(MLC,對= '天',如何= '左')',但我得到這個錯誤:'ValueError異常:列重疊,但沒有指定的後綴:指數([u'delivery',u'days'],dtype ='object')'..我已經在我的原始主題上張貼了一張圖片 –
擺脫'mlc'系列中''days''列因爲它與索引是多餘的。或者,在連接中指定'right_suffix'和/或'left_suffix'。 – wflynny
這工作!謝謝。我絕對不清楚如何使用'join()',但這對我有很大的幫助。 –