2017-08-23 124 views
2

今後每年第一天我有這樣的數據幀:大熊貓數據幀從多指標DateIndex

dft2 = pd.DataFrame(np.random.randn(20, 1), 
         columns=['A'], 
         index=pd.MultiIndex.from_product([pd.date_range('20130101', 
                  periods=10, 
                  freq='4M'), 
                  ['a', 'b']])) 

,看起來像這樣,當我打印出來。

輸出:

   A 
2013-01-31 a 0.275921 
      b 1.336497 
2013-05-31 a 1.040245 
      b 0.716865 
2013-09-30 a -2.697420 
      b -1.570267 
2014-01-31 a 1.326194 
      b -0.209718 
2014-05-31 a -1.030777 
      b 0.401654 
2014-09-30 a 1.138958 
      b -1.162370 
2015-01-31 a 1.770279 
      b 0.606219 
2015-05-31 a -0.819126 
      b -0.967827 
2015-09-30 a -1.423667 
      b 0.894103 
2016-01-31 a 1.765187 
      b -0.334844 

如何由是當年分鐘行選擇過濾器?像2013-01-312014-01-31

謝謝。

回答

1
# Create dataframe from the dates in the first level of the index. 
df = pd.DataFrame(dft2.index.get_level_values(0), columns=['date'], index=dft2.index) 

# Add a `year` column that gets the year of each date. 
df = df.assign(year=[d.year for d in df['date']]) 

# Find the minimum date of each year by grouping. 
min_annual_dates = df.groupby('year')['date'].min().tolist() 

# Filter the original dataframe based on these minimum dates by year. 
>>> dft2.loc[(min_annual_dates, slice(None)), :] 
        A 
2013-01-31 a 1.087274 
      b 1.488553 
2014-01-31 a 0.119801 
      b 0.922468 
2015-01-31 a -0.262440 
      b 0.642201 
2016-01-31 a 1.144664 
      b 0.410701 
0

或者你可以嘗試使用isin

dft1=dft2.reset_index() 
dft1['Year']=dft1.level_0.dt.year 
dft1=dft1.groupby('Year')['level_0'].min() 
dft2[dft2.index.get_level_values(0).isin(dft1.values)] 

Out[2250]: 
        A 
2013-01-31 a -1.072400 
      b 0.660115 
2014-01-31 a -0.134245 
      b 1.344941 
2015-01-31 a 0.176067 
      b -1.792567 
2016-01-31 a 0.033230 
      b -0.960175