2017-08-03 135 views
1

假設我有一個熊貓數據幀是這樣的:使用正則表達式從熊貓數據幀過濾行

  Word  Ratings 
    0  TLYSFFPK 1 
    1  SVLENFVGR 2 
    2  SVFNHAIRK 3 
    3  KAGEVFIHK 4 

如何使用在熊貓正則表達式來篩選出具有符合下列條件的字行正則表達式模式,但保持數據幀格式?正則表達式模式是:\ B [VIFY] [MLFYIA] \ W + [LIYVF] [KR] \ b

預期輸出:

  Word Ratings 
    1  SVLENFVGR 2 
    2  SVFNHAIRK 3 

回答

0

演示:

In [2]: df 
Out[2]: 
     Word Ratings 
0 TLYSFFPK  1 
1 SVLENFVGR  2 
2 SVFNHAIRH  3 
3 KAGEVFIHK  4 

In [3]: pat = r'\b.[VIFY][MLFYIA]\w+[LIYVF].[KR]\b' 

In [4]: df.Word.str.contains(pat) 
Out[4]: 
0 False 
1  True 
2 False 
3 False 
Name: Word, dtype: bool 

In [5]: df[df.Word.str.contains(pat)] 
Out[5]: 
     Word Ratings 
1 SVLENFVGR  2