2016-11-20 41 views
1

我有這個DF:刪除多索引數據框中的行?

temp = pd.DataFrame({'tic': ['IBM', 'AAPL', 'AAPL', 'IBM', 'AAPL'], 
       'industry': ['A', 'B', 'B', 'A', 'B'], 
       'price': [np.nan, 5, 6, 11, np.nan], 
       'shares':[100, 60, np.nan, 100, np.nan], 
       'dates': pd.to_datetime(['1990-01-01', '1990-01-01', '1990-04-01', 
              '1990-04-01', '1990-08-01']) 
       }) 

temp.set_index(['tic', 'dates'], inplace=True) 
temp.sort_index(inplace=True) 

其中產量:

   industry price shares 
tic dates        
AAPL 1990-01-01  B 5.0 60.0 
    1990-04-01  B 6.0  NaN 
    1990-08-01  B NaN  NaN 
IBM 1990-01-01  A NaN 100.0 
    1990-04-01  A 11.0 100.0 

我怎樣才能在數據幀,顯示每個抽動觀測次數創建new column。因此,新列會喜歡這樣的:

 New column 
AAPL ... 3 
     ... 3 
     ... 3 
IBM  ... 2 
     ... 2 
+1

請不要改變你的問題 - 它使現有的答案無效... – MaxU

+1

@MaxU我的不好。不會重複。 – st19297

回答

2

可以使用.groupby(level=0).filter()方法:

In [79]: temp.groupby(level=0).filter(lambda x: len(x) >= 3) 
Out[79]: 
       industry price shares 
tic dates 
AAPL 1990-01-01  B 5.0 60.0 
    1990-04-01  B 6.0  NaN 
    1990-08-01  B NaN  NaN 

回答你的第二個問題:

In [83]: temp['new'] = temp.groupby(level=0)['industry'].transform('size') 

In [84]: temp 
Out[84]: 
       industry price shares new 
tic dates 
AAPL 1990-01-01  B 5.0 60.0 3 
    1990-04-01  B 6.0  NaN 3 
    1990-08-01  B NaN  NaN 3 
IBM 1990-01-01  A NaN 100.0 2 
    1990-04-01  A 11.0 100.0 2