解析表BeautifulSoup的Python

如果我想在下面的格式表讀取條目：解析表BeautifulSoup的Python

<table cellspacing="0" cellpadding="4"> 

stuff 

</table>

我用這個作爲我目前的方法：

pg = urllib2.urlopen(req).read() 
page = BeautifulSoup(pg) 
table = page.find('table', cellpadding = 4, cellspacing = 0)

我table可以」 t正確讀取標籤，最好的方法是什麼？

來源

2013-06-27 Max Kim

你的代碼適合我 – TerryA

我既BeautifulSoup版本3和4。您的代碼與BS4測試這一點，所以你必須使用版本3

>>> from bs4 import BeautifulSoup as BS4 # Version 4 
>>> from BeautifulSoup import BeautifulSoup as BS3 # Version 3 
>>> bs3soup = BS3("""<table cellspacing="0" cellpadding="4"> 
... 
... stuff 
... 
... </table>""") 
>>> bs4soup = BS4("""<table cellspacing="0" cellpadding="4"> 
... 
... stuff 
... 
... </table>""") 
>>> bs3soup.find('table', cellpadding = 4, cellspacing = 0) # None 
>>> bs4soup.find('table', cellpadding = 4, cellspacing = 0) 
<table cellpadding="4" cellspacing="0"> 

stuff 

</table>

所以，如果你想繼續使用BS3，這應該修復它：

>>> soup.find('table', cellpaddin="4", cellspacing="0") # Notice how the integers are now strings, like in the HTML.

但是，您應該使用版本4（from bs4 import BeautifulSoup）。

來源

2013-06-27 01:21:25 TerryA

是的，工作。 –

解析表BeautifulSoup的Python

回答

相關問題