根據兩個字符串值評估大熊貓數據框

我有一個包含名爲「OPINION」的列的Pandas DataFrame「table」，填充字符串值。我想創建一個名爲「cond5」的新列，其中「OPINION」爲「買入」或「中性」的每一行填充TRUE。根據兩個字符串值評估大熊貓數據框

我已經試過

table["cond5"]= table.OPINION == "buy" or table.OPINION == "neutral"

這給了我一個錯誤，並

table["cond5"]= table.OPINION.all() in ("buy", "neutral")

返回FALSE所有行。

來源

2015-02-24 baconwichsand

正如愛德密友指出，你可以使用isin method：

table['cond5'] = table['OPINION'].isin(['buy', 'neutral'])

isin檢查確切的平等。也許這將是最簡單和最易讀的。

要解決

table["cond5"] = table.OPINION == "buy" or table.OPINION == "neutral"

使用

table["cond5"] = (table['OPINION'] == "buy") | (table['OPINION'] == "neutral")

括號是必要的，因爲|具有higher precedence (binding power)比==。

x or y要求x和y是布爾值。

(table['OPINION'] == "buy") or (table['OPINION'] == "neutral")

自從Series can no be reduced to a single boolean value產生錯誤。

因此改爲使用邏輯運算符或運算符|，它以系列元素的方式取值爲or。

另一個替代方案是

import numpy as np 
table["cond5"] = np.logical_or.reduce([(table['OPINION'] == val) for val in ('buy', 'neutral')])

如果('buy', 'neutral')是一個較長的元組可能是有用的。

又一選擇是使用大熊貓vectorized string method, str.contains：

table["cond5"] = table['OPINION'].str.contains(r'buy|neutral')

str.contains執行用於在一個循環Cythonized在table['OPINION']每個項的圖案r'buy|neutral'一個正則表達式的搜索。

現在如何決定使用哪一個？下面是使用IPython一個timeit基準：

In [10]: table = pd.DataFrame({'OPINION':np.random.choice(['buy','neutral','sell',''], size=10**6)}) 

In [11]: %timeit (table['OPINION'] == "buy") | (table['OPINION'] == "neutral") 
10 loops, best of 3: 121 ms per loop 

In [12]: %timeit np.logical_or.reduce([(table['OPINION'] == val) for val in ('buy', 'neutral')]) 
1 loops, best of 3: 204 ms per loop 

In [13]: %timeit table['OPINION'].str.contains(r'buy|neutral') 
1 loops, best of 3: 474 ms per loop 

In [14]: %timeit table['OPINION'].isin(['buy', 'neutral']) 
10 loops, best of 3: 40 ms per loop

所以看起來isin最快。

來源

2015-02-24 12:51:30 unutbu

另一種方法是'table ['OPINION']。isin（['buy'，'neutral']）' – EdChum 2015-02-24 13:01:35

@EdChum：是的，謝謝，我錯過了一個.. – unutbu 2015-02-24 13:03:03

根據兩個字符串值評估大熊貓數據框

回答

相關問題