蟒蛇BeautifulSoup刮網頁表

我想從一個網站，有一個船舶數據庫收集信息。蟒蛇BeautifulSoup刮網頁表

我試圖用BeautifulSoup獲取信息。但目前它似乎沒有工作。我試圖在網上搜索並嘗試不同的解決方案，但沒有設法使代碼正常工作。

我想知道，以我不得不改變 table = soup.find_all("table", { "class" : "table1" }) ---線上有5桌與class='table1'，但我的代碼只發現1

我一定要創建表的循環？當我嘗試這個時，我無法得到它的工作。另外，下一行table_body = table.find('tbody')它給出了一個錯誤：

AttributeError: 'ResultSet' object has no attribute 'find'

這應該是ResultSet的子類名單我的代碼BeautifulSoup的源代碼之間的衝突，。我必須遍歷該列表嗎？

from urllib import urlopen 

shipUrl = 'http://www.veristar.com/portal/veristarinfo/generalinfo/registers/seaGoingShips?portal:componentId=p_efff31ac-af4c-4e89-83bc-55e6d477d131&interactionstate=JBPNS_rO0ABXdRAAZudW1iZXIAAAABAAYwODkxME0AFGphdmF4LnBvcnRsZXQuYWN0aW9uAAAAAQAYc2hpcFNlYXJjaFJlc3VsdHNTZXRTaGlwAAdfX0VPRl9f&portal:type=action&portal:isSecure=false' 
shipPage = urlopen(shipUrl) 

from bs4 import BeautifulSoup 
soup = BeautifulSoup(shipPage) 
table = soup.find_all("table", { "class" : "table1" }) 
print table 
table_body = table.find('tbody') 
rows = table_body.find_all('tr') 
for tr in rows: 
    cols = tr.find_all('td') 
    for td in cols: 
     print td 
    print

來源

2015-12-13 Gert Lõhmus

那麼，有什麼期望的輸出？ –

我需要來自表格中列的數據。像所有者：\t TALLINK GRUPP AS; Flag：\t愛沙尼亞;最後將它們全部保存爲表格in.csv或.txt –

簡短回答：'soup.find_all（）'返回一個沒有'.find（）'方法的列表。你應該使用for循環。 –

幾件事情：

正如凱文提到的，你需要使用一個for循環通過find_all返回的列表進行迭代。

並非所有的表格都有tbody，因此您必須將find的結果包裝在try塊中。

當你做print你想使用.text方法，所以你打印文本值，而不是標籤本身。

下面是修改後的代碼：

shipUrl = 'http://www.veristar.com/portal/veristarinfo/generalinfo/registers/seaGoingShips?portal:componentId=p_efff31ac-af4c-4e89-83bc-55e6d477d131&interactionstate=JBPNS_rO0ABXdRAAZudW1iZXIAAAABAAYwODkxME0AFGphdmF4LnBvcnRsZXQuYWN0aW9uAAAAAQAYc2hpcFNlYXJjaFJlc3VsdHNTZXRTaGlwAAdfX0VPRl9f&portal:type=action&portal:isSecure=false' 
shipPage = urlopen(shipUrl) 

soup = BeautifulSoup(shipPage) 
table = soup.find_all("table", { "class" : "table1" }) 
for mytable in table: 
    table_body = mytable.find('tbody') 
    try: 
     rows = table_body.find_all('tr') 
     for tr in rows: 
      cols = tr.find_all('td') 
      for td in cols: 
       print td.text 
    except: 
     print "no tbody"

將會產生以下的輸出：

Register Number: 
08910M 
IMO Number: 
9365398 
Ship Name: 
SUPERSTAR 
Call Sign: 
ESIY 
.....

來源

2015-12-13 18:33:23 dstudeba

蟒蛇BeautifulSoup刮網頁表

回答

相關問題