在BeautifulSoup中匹配id's

我正在使用BeautifulSoup - python模塊。我必須找到任何對id的引用，例如：'post-＃'。例如：在BeautifulSoup中匹配id's

<div id="post-45">...</div> 
<div id="post-334">...</div>

我該如何過濾？

html = '<div id="post-45">...</div> <div id="post-334">...</div>' 
soupHandler = BeautifulSoup(html) 
print soupHandler.findAll('div', id='post-*') 
> []

來源

2010-05-13 Ockonal

什麼版本的BeautifulSoup您使用的是？ – 2010-05-13 22:03:17

你可以通過一個函數來findAll：

>>> print soupHandler.findAll('div', id=lambda x: x and x.startswith('post-')) 
[<div id="post-45">...</div>, <div id="post-334">...</div>]

或正則表達式：

>>> print soupHandler.findAll('div', id=re.compile('^post-')) 
[<div id="post-45">...</div>, <div id="post-334">...</div>]

來源

2010-05-13 21:46:28

AttributeError：'NoneType'對象沒有屬性'startswith' – Ockonal 2010-05-13 21:49:38

我修復了'AttributeError'。 – jfs 2010-05-14 08:11:50

對於lambda函數爲+1 – 2011-05-31 04:08:50

soupHandler.findAll('div', id=re.compile("^post-$"))

看起來我的權利。

來源

2010-05-14 07:59:03 Auston

爲什麼要放置'$'？我認爲這不會像OP的意圖那樣工作。 – 2010-05-14 16:31:21

由於他要求，以配合「後＃somenumber＃」，這是更好地精確，

import re 
[...] 
soupHandler.findAll('div', id=re.compile("^post-\d+"))

來源

2013-01-14 14:50:04 xiamx

這個工作對我來說：

from bs4 import BeautifulSoup 
import re 

html = '<div id="post-45">...</div> <div id="post-334">...</div>' 
soupHandler = BeautifulSoup(html) 

for match in soupHandler.find_all('div', id=re.compile("post-")): 
    print match.get('id') 

>>> 
post-45 
post-334

來源

2015-03-09 17:12:51

在BeautifulSoup中匹配id's

回答

相關問題