從正則表達式模式

返回不匹配的行。如果我有一個熊貓數據框，看起來像這樣：從正則表達式模式

 Sequence  Rating 
0 HYHIVQKF  1 
1 YGEIFEKF  2 
2 TYGGSWKF  3 
3 YLESFYKF  4 
4 YYNTAVKL  5 
5 WPDVIHSF  6

這是我使用的返回匹配以下模式，該行的代碼： \b.[YF]\w+[LFI]\b

pat = r'\b.[YF]\w+[LFI]\b' 
new_df.Sequence.str.contains(pat) 

new_df[new_df.Sequence.str.contains(pat)]

上面的代碼返回與模式匹配的行，但是我可以使用什麼來返回不匹配的行？

預期輸出：

 Sequence Rating 
1 YGEIFEKF 2 
3 YLESFYKF 4 
5 WPDVIHSF 6

來源

2017-08-07 nobodyAskedYouPatrice

你可以做你的現有布爾系列的否定：

df[~df.Sequence.str.contains(pat)]

這會給你想要的輸出：

Sequence Rating 
1 YGEIFEKF  2 
3 YLESFYKF  4 
5 WPDVIHSF  6

簡要說明：

df.Sequence.str.contains(pat)

會返回一個布爾系列：

0  True 
1 False 
2  True 
3 False 
4  True 
5 False 
Name: Sequence, dtype: bool

使用~產量

~df.Sequence.str.contains(pat) 

0 False 
1  True 
2 False 
3  True 
4 False 
5  True 
Name: Sequence, dtype: bool

這是另一個布爾系列可以傳遞給否定它您的原始數據框。

來源

2017-08-07 23:36:40 Cleb

可以使用~爲not：

pat = r'\b.[YF]\w+[LFI]\b' 
new_df[~new_df.Sequence.str.contains(pat)] 

# Sequence Rating 
#1 YGEIFEKF 2 
#3 YLESFYKF 4 
#5 WPDVIHSF 6

來源

2017-08-07 23:35:28 Psidom

Psidom's answer更優雅，但另一種方式來解決這個問題是修改正則表達式使用負先行斷言，然後用match()代替：

pat = r'\b.[YF]\w+[LFI]\b' 
not_pat = r'(?!{})'.format(pat) 

>>> new_df[new_df.Sequence.str.match(pat)] 
    Sequence Rating 
0 HYHIVQKF  1 
2 TYGGSWKF  3 
4 YYNTAVKL  5 

>>> new_df[new_df.Sequence.str.match(not_pat)] 
    Sequence Rating 
1 YGEIFEKF  2 
3 YLESFYKF  4 
5 WPDVIHSF  6

來源

2017-08-07 23:45:25 mhawke

從正則表達式模式

回答

相關問題