如何使用正則表達式按給定範圍獲得匹配結果？

我用我的代碼來獲得所有返回匹配給定的範圍。我的數據樣本是：如何使用正則表達式按給定範圍獲得匹配結果？

 comment 
0  [intj74, you're, whipping, people, is, a, grea... 
1  [home, near, kcil2, meniaga, who, intj47, a, l... 
2  [thematic, budget, kasi, smooth, sweep] 
3  [budget, 2, intj69, most, people, think, of, e...

我想要得到的結果爲：（當給定的範圍是intj1到intj75）

  comment 
0  [intj74] 
1  [intj47]  
2  [nan] 
3  [intj69]

我的代碼是：

df.comment = df.comment.apply(lambda x: [t for t in x if t=='intj74']) 
df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]

我m不知道如何使用正則表達式來找到t =='range'的範圍。或者任何其他想法做到這一點？

由於提前，

熊貓Python的新手

來源

2016-09-15 Suhairi Suhaimin

'intj \ d +'匹配'intj'後跟一個或多個數字。 – Maroun

@Maroun Maroun謝謝你的回覆。不幸的是它不工作。返回所有[nan] ....或者如何應用您的建議？ –

你可以取代[t for t in x if t=='intj74']用，例如，

[t for t in x if re.match('intj[0-9]+$', t)]

甚至

[t for t in x if re.match('intj[0-9]+$', t)] or [np.nan]

這也將處理，如果情況沒有匹配（所以不需要檢查對於明確使用df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]）這裏的「訣竅」是空列表的計算結果爲False，這樣or在這種情況下返回其右操作數。

來源

2016-09-15 08:49:26 ewcz

Yess !!!導入完成後，解決方案re.match（'intj [0-9] + $'，t）很好。非常感謝你@ewcz –

再次感謝@ewcz分享的「訣竅」。我試過了，它的工作，甚至縮短我的代碼。 –

我是pandas的新手。你可能已經初始化了你的DataFrame。無論如何，這是我有：

import pandas as pd 

data = { 
    'comment': [ 
     "intj74, you're, whipping, people, is, a", 
     "home, near, kcil2, meniaga, who, intj47, a", 
     "thematic, budget, kasi, smooth, sweep", 
     "budget, 2, intj69, most, people, think, of" 
    ] 
} 
print(df.comment.str.extract(r'(intj\d+)'))

來源

2016-09-15 08:55:00

感謝您提出.str.extract，這是另一種方法。然而，我得到FutureWarning：目前提取（展開=無）意味着expand = False（返回Index/Series/DataFrame），但在未來版本的熊貓中，這將改爲expand = True（返回DataFrame） if __name__ =='__main__ 「：。我所有的結果都是NaN。 –

你可以顯式地通過擴展參數：'df.comment.str.extract（r'（intj \ d +）'，expand = True）'。 True將返回一個DataFrame。假將返回一個系列。使用適合你的東西。 –

哦，我明白了。感謝解釋@arvindpdmn –

如何使用正則表達式按給定範圍獲得匹配結果？

回答

相關問題