刪除行中包含列表中存在的字符串的pandas DataFrame中的行嗎？

我知道如何從單柱（「發件人」）中刪除行大熊貓數據框，其中行包含給定df和somestring一個字符串如：刪除行中包含列表中存在的字符串的pandas DataFrame中的行嗎？

df = df[~df.From.str.contains(someString)]

現在我希望做同樣的事情，但這次我希望刪除包含位於另一個列表的任何元素中的字符串的行。如果我不使用熊貓，我會使用for和if ... not ... in方法。但是，我如何利用熊貓自己的功能來實現這一點呢？鑑於項目的列表中刪除ignorethese，從逗號分隔的字符串EMAILS_TO_IGNORE的文件提取出來，我想：

with open(EMAILS_TO_IGNORE) as emails: 
     ignorethese = emails.read().split(', ') 
     df = df[~df.From.isin(ignorethese)]

難道我首先將文件分解成一個列表卷積事項？鑑於這是一個逗號分隔值的純文本文件，我可以繞過這個更簡單的東西嗎？

來源

2015-09-18 Pyderman

上面的嘗試實際上似乎刪除了一行，但（i）我不知道哪一行被刪除，（ii）它應該刪除更多。 – Pyderman

Series.str.contains支持正則表達式，你可以從你的郵件列表中創建一個正則表達式使用|到OR他們忽視，然後用在contains。示例 -

df[~df.From.str.contains('|'.join(ignorethese))]

演示 -

In [109]: df 
Out[109]: 
             From 
0   Grey Caulfu <[email protected]> 
1 Deren Torculas <[email protected]> 
2 Charlto Youna <[email protected]> 

In [110]: ignorelist = ['[email protected]','[email protected]'] 

In [111]: ignorere = '|'.join(ignorelist) 

In [112]: df[~df.From.str.contains(ignorere)] 
Out[112]: 
             From 
2 Charlto Youna <[email protected]>

請注意，在the documentation提到它使用re.search()。

來源

2015-09-18 06:03:36

再一次，靈感來自於機箱外思考和優雅的解決方案。謝謝。 – Pyderman

刪除行中包含列表中存在的字符串的pandas DataFrame中的行嗎？

回答

相關問題