2015-04-29 65 views
1

我有一個看起來像這樣的一個爲例大熊貓據幀一數據幀的行:熊貓:如何提取匹配過濾器1或過濾器2

label   Y88_N   diff  div  fold 
0  25273.626713 17348.581851 2.016404 2.016404 
1  29139.510491 -4208.868050 0.604304 -0.604304 
2  34388.439717 -30147.834699 0.458903 -0.458903 
3  69704.254089 -32976.152490 0.116894 -0.116894 
4  193717.440783 -71359.494098 0.286045 -0.286045 
5  28996.634708 10934.944533 2.031293 2.031293 
6  45021.782930 680.437629 1.056383 1.056383 

但幾千行的。 當'fold'列中的值爲'2'或'<'0.6時,我希望獲得具有行的新數據框。 所以最終數據幀應該是這樣的:

label   Y88_N   diff  div  fold 
0  25273.626713 17348.581851 2.016404 2.016404 
1  29139.510491 -4208.868050 0.604304 -0.604304 
5  28996.634708 10934.944533 2.031293 2.031293 

我已經試過喜歡不同的東西:

def ranged(start, end, step): 
x = start 
    while x < end: 
     yield x 
     x += step 
df2 = df[~df['fold'].isin(ranged(-0.6, 2, 0.000001))] 

df2 = df[(df['fold'] >= 2) & (df['fold'] <= -0.6)] 

但似乎沒有任何工作 是否有簡單的方法來選擇列中的值,或者匹配過濾器1或過濾器2? 感謝

+0

只是一個技術點'DF2 = DF [(DF [ '倍']> = 2)&(DF [ '摺疊'] <= -0.6)]'是表達式效果很好邏輯上不正確,值不可能小於或等於-0.6且大於或等於2 – EdChum

回答

3

你可以做

In [276]: df[(df['fold'] >= 2) | (df['fold'] <= -0.6)] 
Out[276]: 
    label   Y88_N   diff  div  fold 
0  0 25273.626713 17348.581851 2.016404 2.016404 
1  1 29139.510491 -4208.868050 0.604304 -0.604304 
5  5 28996.634708 10934.944533 2.031293 2.031293 

或者使用query方法類似

In [277]: df.query('fold >=2 | fold <=-0.6') 
Out[277]: 
    label   Y88_N   diff  div  fold 
0  0 25273.626713 17348.581851 2.016404 2.016404 
1  1 29139.510491 -4208.868050 0.604304 -0.604304 
5  5 28996.634708 10934.944533 2.031293 2.031293 

而且,pd.eval()與含有大陣列

In [278]: df[pd.eval('df.fold >=2 | df.fold <=-0.6')] 
Out[278]: 
    label   Y88_N   diff  div  fold 
0  0 25273.626713 17348.581851 2.016404 2.016404 
1  1 29139.510491 -4208.868050 0.604304 -0.604304 
5  5 28996.634708 10934.944533 2.031293 2.031293 
+0

這看起來不錯,但每種方法(運行速度,內存等)的優點/缺點是什麼? – EOL

+0

非常感謝這個答案,這真的很完整。 我不知道爲什麼,但我嘗試了類似於 df [(df ['fold']> = 2)| (df ['fold'] <= -0.6)]在開始時沒有成功 那時我可能錯過了一些東西。 –

1

你只需要在第二個例子中使用|(OR),而不是&(AND):

df2 = df[(df['fold'] >= 2) | (df['fold'] <= -0.6)] 

df2 
Out[6]: 
    label   Y88_N   diff  div  fold 
0  0 25273.626713 17348.581851 2.016404 2.016404 
1  1 29139.510491 -4208.868050 0.604304 -0.604304 
5  5 28996.634708 10934.944533 2.031293 2.031293