2017-02-09 95 views
4

比方說,我有一個分類變量和值時間序列數據幀:在大熊貓的興趣行之前和之後選擇行

In [4]: df = pd.DataFrame(data={'category': np.random.choice(['A', 'B', 'C', 'D'], 11), 'value': np.random.rand(11)}, index=pd.date_range('2015-04-20','2015-04-30')) 

In [5]: df 
Out[5]: 
      category  value 
2015-04-20  D 0.220804 
2015-04-21  A 0.992445 
2015-04-22  A 0.743648 
2015-04-23  B 0.337535 
2015-04-24  B 0.747340 
2015-04-25  B 0.839823 
2015-04-26  D 0.292628 
2015-04-27  D 0.906340 
2015-04-28  B 0.244044 
2015-04-29  A 0.070764 
2015-04-30  D 0.132221 

如果我感興趣的是與A類行,過濾分離他們是微不足道的。但如果我對類別A之前的n行感興趣呢?如果n = 2,我想看到的東西,如:

In [5]: df[some boolean indexing] 
Out[5]: 
      category  value 
2015-04-20  D 0.220804 
2015-04-21  A 0.992445 
2015-04-22  A 0.743648 
2015-04-27  D 0.906340 
2015-04-28  B 0.244044 
2015-04-29  A 0.070764 

同樣,如果我感興趣的n行各地 A類的?再次如果n = 2,我想看到這個:

In [5]: df[some other boolean indexing] 
Out[5]: 
      category  value 
2015-04-20  D 0.220804 
2015-04-21  A 0.992445 
2015-04-22  A 0.743648 
2015-04-23  B 0.337535 
2015-04-24  B 0.747340 
2015-04-27  D 0.906340 
2015-04-28  B 0.244044 
2015-04-29  A 0.070764 
2015-04-30  D 0.132221 

謝謝!

+2

你可能會有所幫助:http://stackoverflow.com/questions/28837633/pandas-get-position-of-a-given-指數在非數據幀 –

回答

1

n行:

In [223]: idx = df.index.get_indexer_for(df[df.category=='A'].index) 

In [224]: n = 1 

In [225]: df.iloc[np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(df))) 
              for i in idx]))] 
Out[225]: 
      category  value 
2015-04-20  D 0.220804 
2015-04-21  A 0.992445 
2015-04-22  A 0.743648 
2015-04-23  B 0.337535 
2015-04-28  B 0.244044 
2015-04-29  A 0.070764 
2015-04-30  D 0.132221 

In [226]: n = 2 

In [227]: df.iloc[np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(df))) 
              for i in idx]))] 
Out[227]: 
      category  value 
2015-04-20  D 0.220804 
2015-04-21  A 0.992445 
2015-04-22  A 0.743648 
2015-04-23  B 0.337535 
2015-04-24  B 0.747340 
2015-04-27  D 0.906340 
2015-04-28  B 0.244044 
2015-04-29  A 0.070764 
2015-04-30  D 0.132221 
4

要回答你的第一個問題:

df[pd.concat([df.category.shift(-i)=='A' for i in range(n)], axis=1).any(axis=1)] 

你會希望能夠延長相同的(也許有點笨拙的一個)的方式來覆蓋更多的情況。圍繞一個類別的