正則表達式搜索，如果條件

我試圖搜索整個單詞PID中的鏈接，但有些這也是在這段代碼正則表達式搜索，如果條件

for a in self.soup.find_all(href=True): 

     if 'pid' in a['href']: 
      href = a['href'] 
      if not href or len(href) <= 1: 
       continue 
      elif 'javascript:' in href.lower(): 
       continue 
      else: 
       href = href.strip() 
      if href[0] == '/': 
       href = (domain_link + href).strip() 
      elif href[:4] == 'http': 
       href = href.strip() 
      elif href[0] != '/' and href[:4] != 'http': 
       href = (domain_link + '/' + href).strip() 
      if '#' in href: 
       indx = href.index('#') 
       href = href[:indx].strip() 
      if href in links: 
       continue 

      links.append(self.re_encode(href))

來源

2015-09-05 Dhrubo Naskar

對不起，我的意思是正則表達式 –

我不清楚什麼是錯在這裏。你能清楚地知道你遇到問題的哪部分代碼，特別是它現在的行爲方式以及你希望它的行爲方式如何？ – larsks

我認爲這可能是[子字符串測試字符串]的副本（http://stackoverflow.com/questions/5473014/test-a-string-for-a-substring） – C8H10N4O2

搜索ID如果你的意思是，你希望它匹配的字符串像/pid/0002但不是/rapid.html，那麼你需要排除任何一方的單詞字符。喜歡的東西：

>>> re.search(r'\Wpid\W', '/pid/0002') 
<_sre.SRE_Match object; span=(0, 5), match='/pid/'> 
>>> re.search(r'\Wpid\W', '/rapid/123') 
None

如果「PID」可能是在開始或結束的字符串，你需要添加額外的條件：檢查線的任意開始/結束或一個非文字字符：

>>> re.search(r'(^|\W)pid($|\W)', 'pid/123') 
<_sre.SRE_Match object; span=(0, 4), match='pid/'>

有關特殊字符的更多信息，請參閱the docs。

你可以使用這樣的：

pattern = re.compile(r'(^|\W)pid($|\W)') 
if pattern.search(a['href']) is not None: 
    ...

來源

2015-09-05 03:18:40 z0r

實際上有三種情況，一種是pid =，一種是需要sid = tyy，4mr＆icmpid，另一種只使用id，如Widget等。我只想顯示第一個只有？pid –

謝謝我使用了這個表達式它的工作模式= re.compile（r'（\？pid \ =）'） –

很酷。但在這種情況下，您可能想要進行正確的URL解析。 Python有幫助的庫：參見[urllib.parse]（https://docs.python.org/3/library/urllib.parse.html）（py3）和[urlparse]（https://docs.python.org /2/library/urlparse.html）（py2）。可以很容易地處理其他情況，比如'pid'參數不是第一個（'＆pid = ...'）。 – z0r

正則表達式搜索，如果條件

回答

相關問題