2013-06-27 116 views
1

如果我想在下面的格式表讀取條目:解析表BeautifulSoup的Python

<table cellspacing="0" cellpadding="4"> 

stuff 

</table> 

我用這個作爲我目前的方法:

pg = urllib2.urlopen(req).read() 
page = BeautifulSoup(pg) 
table = page.find('table', cellpadding = 4, cellspacing = 0) 

table可以」 t正確讀取標籤,最好的方法是什麼?

+1

你的代碼適合我 – TerryA

回答

1

我既BeautifulSoup版本3和4。您的代碼與BS4測試這一點,所以你必須使用版本3

>>> from bs4 import BeautifulSoup as BS4 # Version 4 
>>> from BeautifulSoup import BeautifulSoup as BS3 # Version 3 
>>> bs3soup = BS3("""<table cellspacing="0" cellpadding="4"> 
... 
... stuff 
... 
... </table>""") 
>>> bs4soup = BS4("""<table cellspacing="0" cellpadding="4"> 
... 
... stuff 
... 
... </table>""") 
>>> bs3soup.find('table', cellpadding = 4, cellspacing = 0) # None 
>>> bs4soup.find('table', cellpadding = 4, cellspacing = 0) 
<table cellpadding="4" cellspacing="0"> 

stuff 

</table> 

所以,如果你想繼續使用BS3,這應該修復它:

>>> soup.find('table', cellpaddin="4", cellspacing="0") # Notice how the integers are now strings, like in the HTML. 

但是,您應該使用版本4(from bs4 import BeautifulSoup)。

+0

是的,工作。 –