BeautifulSoup，簡單的正則表達式問題

我只是碰到了一個與正則表達式的障礙，不知道爲什麼這不起作用。BeautifulSoup，簡單的正則表達式問題

下面是BeautifulSoup醫生說：

soup.find_all(class_=re.compile("itl")) 
# [<p class="title"><b>The Dormouse's story</b></p>]

這裏是我的html：

<a href="exam.com" title="Keeper: Jay" class="pos_text">Aouate</a></span><span class="pos_text pos3_l_4">

，我試圖將span標籤（最後一個位置）相匹配。

>>> if soup.find(class_=re.compile("pos_text pos3_l_\d{1}")): 
     print "Yes" 

# prints nothing - indicating there is no such pattern in the html

所以，我只是重複BS4文檔，除了我的正則表達式不工作。果然，如果我將\d{1}替換爲4（如最初在html中），它會成功。

來源

2013-04-09 nutship

在您的正則表達式中嘗試「\\ d」。它可能將「\ d」解釋爲試圖逃脫「d」。

或者，原始字符串應該工作。只要把「R」在正則表達式的前面，就像這樣：

re.compile(r"pos_text pos3_l_\d{1}")

來源

2013-04-09 18:19:27

爲什麼會Ð需要逃避相匹配的正則表達式？ – PuercoPop 2013-04-09 18:26:55

'd'不需要轉義。 '\\'需要轉義。 – 2013-04-09 18:27:44

@JoeFrambach說什麼。 – 2013-04-09 18:28:10

我不能完全肯定，但這個工作對我來說：

soup.find(attrs={'class':re.compile('pos_text pos3_l_\d{1}')})

來源

2013-04-09 18:19:54

從文檔：所有版本的美麗湯有class_目前的快捷方式。任何find（）類型的方法的第二個參數被稱爲attrs，傳遞給attrs的字符串將搜索該字符串作爲CSS類： – PuercoPop 2013-04-09 18:24:54

哦，整齊。我從未注意到這一點。 – 2013-04-09 18:27:05

嗯，它不適合我，但/無論如何回覆。 – nutship 2013-04-09 21:58:21

你是不匹配的一類，但針對特定順序的特定組合。

從documentation：

You can also search for the exact string value of the class attribute: 

css_soup.find_all("p", class_="body strikeout") 
# [<p class="body strikeout"></p>] But searching for variants of the string value won’t work: 

css_soup.find_all("p", class_="strikeout body") 
# []

所以，你應該problable拳頭匹配POST_TEXT，然後在結果嘗試匹配與該搜索

來源

2013-04-09 18:24:40 PuercoPop

BeautifulSoup，簡單的正則表達式問題

回答

相關問題