2016-12-11 82 views
0

我試圖使用Python正則表達式包重新匹配在python 2.7以下字符串時遇到未來與該正則表達式的代碼麻煩:Python的RE(正則表達式)匹配包含字母,連字符,數字特定字符串

https://www.this.com/john-smith/e5609239 
https://www.this.com/jane-johnson/e426609216 
https://www.this.com/wendy-saad/e172645609215 
https://www.this.com/nick-madison/e7265609214 
https://www.this.com/tom-taylor/e17265709211 
https://www.this.com/james-bates/e9212 

所以前綴是固定的「https://www.this.com/」,然後有一個可變數量的小寫字母,然後是「 - 」,然後是「e」,然後是可變數量的數字。

這是我試圖無濟於事:

href=re.compile("https://www.this.com/people-search/[a-z]+[\-](?P<firstNumBlock>\d+)/") 

href=re.compile("https://www.this.com/people-search/[a-z][\-][a-z]+/e[0-9]+") 

感謝您的幫助!

回答

1

你碰到的問題與轉義特殊字符。既然你沒有使用原始字符串,反斜槓在你的字符串本身中有特殊的含義。此外,字符類(與[])不需要在正則表達式中轉義。您可以簡化表達如下:

expression = r"https://www.mylife.com/people-search/[a-z]+-[a-z]+/e\d+" 

數據如下:

strings = ['https://www.mylife.com/people-search/john-smith/e5609239', 
'https://www.this.com/people-search/jane-johnson/e426609216', 
'https://www.this.com/people-search/wendy-saad/e172645609215', 
'https://www.this.com/people-search/nick-madison/e7265609214', 
'https://www.this.com/people-search/tom-taylor/e17265709211', 
'https://www.this.com/people-search/james-bates/e9212'] 

結果:

>>> for s in strings: 
...  print(re.match(expression, s)) 
... 
<_sre.SRE_Match object; span=(0, 56), match='https://www.this.com/people-search/john-smith/e> 
<_sre.SRE_Match object; span=(0, 60), match='https://www.this.com/people-search/jane-johnson> 
<_sre.SRE_Match object; span=(0, 61), match='https://www.this.com/people-search/wendy-saad/e> 
<_sre.SRE_Match object; span=(0, 61), match='https://www.this.com/people-search/nick-madison> 
<_sre.SRE_Match object; span=(0, 60), match='https://www.this.com/people-search/tom-taylor/e> 
<_sre.SRE_Match object; span=(0, 54), match='https://www.this.com/people-search/james-bates/> 
1
re.compile(r'https://www.this.com/[a-z-]+/e\d+') 

[a-z-]+比賽john-smith e\d+比賽e5609239

1
text = '''https://www.this.com/john-smith/e5609239 
https://www.this.com/jane-johnson/e426609216 
https://www.this.com/wendy-saad/e172645609215 
https://www.this.com/nick-madison/e7265609214 
https://www.this.com/tom-taylor/e17265709211 
https://www.this.com/james-bates/e9212''' 
href = re.compile(r'https://www\.this\.com/[a-zA-Z]+\-[a-zA-Z]+/e[0-9]+') 
m = href.findall(text) 
pprint(m) 

輸出:

['https://www.this.com/john-smith/e5609239', 
'https://www.this.com/jane-johnson/e426609216', 
'https://www.this.com/wendy-saad/e172645609215', 
'https://www.this.com/nick-madison/e7265609214', 
'https://www.this.com/tom-taylor/e17265709211', 
'https://www.this.com/james-bates/e9212'] 
相關問題