迭代通過大熊貓行有效

我有一個看起來像這樣的列表：迭代通過大熊貓行有效

lst = ['a','b','c']

和看起來像這樣一個數據幀：

id col1 
1 ['a','c'] 
2 ['b'] 
3 ['b', 'a']

我期待在數據幀來創建一個新列它具有col1的第一個和單個列表的交集的長度

id col1   intersect 
1 ['a','c'] 2 
2 ['b']  1 
3 ['d', 'a'] 1

目前我的代碼看起來像這樣的：

df['intersection'] = np.nan 
for i, r in df.iterrows(): 
    ## If-Statement to deal with Nans in col1 
    if r['col1'] == r['col1']: 
     df['intersection'][i] = len(set(r['col1']).intersection(set(lst)))

的問題是，這個代碼是非常耗時在我的200K行的數據集，並與200個元素的列表相交。有沒有辦法更有效地做到這一點？

謝謝！

來源

2016-07-26 eljusticiero67

爲什麼需要if語句？它看起來總是對我真實？ – Psidom

檢查nans。如果x是nan，x == x將返回false。 – eljusticiero67

你試過嗎？

lstset = set(lst) 
df['intersection'] = df['col1'].apply(lambda x: len(set(x).intersection(lstset)))

另一種可能性是

df['intersection'] = df['col1'].apply(lambda x: len([1 for item in x if item in lst]))

來源

2016-07-26 21:21:09

呃！即時通訊這樣的假人!!!!!! – eljusticiero67

迭代通過大熊貓行有效

回答

相關問題