熊貓數據框篩選

比方說，我有一個DataFrame有四列，每列都有一個閾值，我想比較DataFrame的值。熊貓數據框篩選

我只是喜歡DataFrame的最小值或閾值。

例如：

df = pd.DataFrame(np.random.randn(100,4), columns=list('ABCD')) 

>>> df.head() 
      A   B   C   D 
0 -2.060410 -1.390896 -0.595792 -0.374427 
1 0.660580 0.726795 -1.326431 -1.488186 
2 -0.955792 -1.852701 -0.895178 -1.353669 
3 -1.002576 -0.321210 1.711597 -0.063274 
4 1.217197 0.202063 -1.407561 0.940371 

thresholds = pd.Series({'A': 1, 'B': 1.1, 'C': 1.2, 'D': 1.3})

此解決方案（A4和C3過濾），但必須有一個更簡單的方法：

df_filtered = df.lt(thresholds).multiply(df) + df.gt(thresholds).multiply(thresholds) 

>>> df_filtered.head() 
      A   B   C   D 
0 -2.060410 -1.390896 -0.595792 -0.374427 
1 0.660580 0.726795 -1.326431 -1.488186 
2 -0.955792 -1.852701 -0.895178 -1.353669 
3 -1.002576 -0.321210 1.200000 -0.063274 
4 1.000000 0.202063 -1.407561 0.940371

理想情況下，我想使用的.loc過濾到位，但我還沒有設法弄清楚。我使用熊貓0.14.1（不能升級）。

響應下面是我對替代初步建議的定時測試：

%%timeit 
df.lt(thresholds).multiply(df) + df.gt(thresholds).multiply(thresholds) 
1000 loops, best of 3: 990 µs per loop 

%%timeit 
np.minimum(df, thresholds) # <--- Simple, fast, and returns DataFrame! 
10000 loops, best of 3: 110 µs per loop 

%%timeit 
df[df < thresholds].fillna(thresholds, inplace=True) 
1000 loops, best of 3: 1.36 ms per loop

來源

2015-04-06 Alexander

這是相當快的（和返回數據幀）：

np.minimum(df, [1.0,1.1,1.2,1.3])

令人驚喜的是numpy的是如此適合這個沒有任何整形或顯式轉換...

來源

2015-04-06 01:51:04 JohnE

如何：

df[df < thresholds].fillna(thresholds, inplace=True)

showing result

來源

2015-04-06 01:44:58

比我更好的辦法，但仍創造了數據的副本（DF [DF <閾值]創建副本，然後隨即改變）。 – Alexander 2015-04-06 01:49:59

熊貓數據框篩選

回答

相關問題