匹配周圍圍繞着一組比賽

下面的文本是一個例子：匹配周圍圍繞着一組比賽

<li><a href="link" target="_parent">1. Tips and tricks</a></li>

正則表達式：

/tips(?![^<]*>)/ig

匹配字提示。

我想要做的是能夠匹配周圍的文本，可能在另一個組中？

所以比賽可能是e.g. ["1. Tips and tricks", "Tips"].

你可以測試一下here

來源

2014-07-09 viperfx

爲什麼不使用HTML解析器而不是正則表達式？ – jonrsharpe

你在找什麼？ ''標籤之間的所有文字？ – Jerry

我正在嘗試僅查找文本節點，並且發現使用正則表達式比遍歷DOM要容易得多。原因是我翻譯文本的語言，因此我只需要文本信息。 – viperfx

我認爲你正在試圖獲得此，

>>> import re 
>>> str = '<li><a href="link" target="_parent">1. Tips and tricks</a></li>' 
>>> m = re.findall(r'((?<=>)\d+\.\s*(Tips)[^<]*)', str) 
>>> m 
[('1. Tips and tricks', 'Tips')]

>>> str = """ 
... <li> 
... <a href="link" target="_parent"> 
... 1. Tips and tricks 
... </a> 
... </li>""" 
>>> m = re.findall(r'\s*<a[^>]*>\n(\s*\S*\s*(\S*)[^\n]*)', str) 
>>> m 
[('1. Tips and tricks', 'Tips')]

來源

2014-07-09 09:52:41

我使用re.finditer，它似乎並沒有被返回的第一個解決方案的任何結果。 – viperfx

第二種解決方案效果不好，因爲仍留有一些html標籤。 – viperfx

你能發佈pastebin中的實際輸入嗎？ –

爲 re模塊

Python的文檔指出：

從左至右，從1
子組編號向上。組可以嵌套;爲了確定數字，只需從左到右計算左括號字符。

因此，舉例來說，下面的（醜陋的）模式將一組周圍的文本和目標詞從你的榜樣鏈接匹配：

/[^\n\s](.*basics(?![^<]*>).*)\n/ig

您可以優化該爲你的情況！

編輯：使用正則表達式解析HTML仍然是一個非常糟糕的主意，類似beautifulsoup會更健壯。

來源

2014-07-09 09:51:54

我正在嘗試查找文本節點，並且發現使用正則表達式比遍歷DOM要容易得多。原因是我翻譯文本的語言，因此我只需要文本信息。我已經使用過美麗的裝飾，但是與正則表達式很容易相比，我發現它有更多的工作。 – viperfx

按照你的意見，我認爲這是簡單的方式使用BeautifulSoup然後用re.split清理一下：

from bs4 import BeautifulSoup 
import re 

html = """<li class="selected "> 
<a href="http://localhost:8888/translate_url" target="_parent"> 
      Learn the Basics: get iniciared 
     </a> 
<ul class="subtopics"> 
<li> 
<a href="http://localhost:8888/translate_url" target="_parent"> 
       Tips and tricks 
       </a> 
</li> 
<li> 
<a href="http://localhost:8888/translate_url" target="_parent"> 
       Use bookmarks 
       </a> 
</li>""" 

soup = BeautifulSoup(html) 
text = re.split(r'\s{2,}', soup.get_text().strip())

輸出：

['Learn the Basics: get iniciared', 'Tips and tricks', 'Use bookmarks']

soup.get_text()獲取頁面中所有的文本。然後使用strip()刪除前導和尾隨空格，以便在文本列表中不會出現空字符串。

來源

2014-07-09 10:19:21 Jerry

匹配周圍圍繞着一組比賽

回答

相關問題