Python beautifulsoup迭代表

我想將表數據轉換爲CSV文件。不幸的是，我遇到了一個障礙，下面的代碼簡單地重複從所有後續TR中的第一個TR開始的TD。Python beautifulsoup迭代表

import urllib.request 
from bs4 import BeautifulSoup 

f = open('out.txt','w') 

url = "http://www.international.gc.ca/about-a_propos/atip-aiprp/reports-rapports/2012/02-atip_aiprp.aspx" 
page = urllib.request.urlopen(url) 

soup = BeautifulSoup(page) 

soup.unicode 

table1 = soup.find("table", border=1) 
table2 = soup.find('tbody') 
table3 = soup.find_all('tr') 

for td in table3: 
    rn = soup.find_all("td")[0].get_text() 
    sr = soup.find_all("td")[1].get_text() 
    d = soup.find_all("td")[2].get_text() 
    n = soup.find_all("td")[3].get_text() 

    print(rn + "," + sr + "," + d + ",", file=f)

這是我的第一個Python腳本，所以任何幫助將不勝感激！我已經看過其他問題的答案，但無法弄清楚我在這裏做錯了什麼。

來源

2012-04-25 Will

你開始在文檔的每次使用find()或find_all()時間頂層，所以當你要求，例如，所有的「TD」`標籤你得到所有的「TD」標籤在文檔中，不僅僅是您搜索的表格和行中的那些文檔。您可能不會搜索這些內容，因爲它們沒有以您的代碼編寫的方式使用。

我想你想要做這樣的事情：

table1 = soup.find("table", border=1) 
table2 = table1.find('tbody') 
table3 = table2.find_all('tr')

或者，你知道的，更多的東西就是這樣，有更多的描述變量名引導：

rows = soup.find("table", border=1).find("tbody").find_all("tr") 

for row in rows: 
    cells = row.find_all("td") 
    rn = cells[0].get_text() 
    # and so on

來源

2012-04-25 05:08:39 kindall

的問題是，每次你試圖縮小你的搜索範圍（獲得這個tr的第一個td等）時，你只需要打電話回湯。湯是最高級別的對象 - 它代表整個文檔。你只需要喝湯一次，然後用下面的步驟代替湯的結果。

例如（變量名更改爲更清晰），

table = soup.find('table', border=1) 
rows = table.find_all('tr') 

for row in rows: 
    data = row.find_all("td") 
    rn = data[0].get_text() 
    sr = data[1].get_text() 
    d = data[2].get_text() 
    n = data[3].get_text() 

    print(rn + "," + sr + "," + d + ",", file=f)

我不知道這print語句就是做你想在這裏做（在最什麼是最好的方式至少，你應該使用字符串格式，而不是加法），但是我現在離開它，因爲它不是核心問題。

另外，爲完成：soup.unicode不會做任何事情。你不是在那裏調用一個方法，也沒有任務。我不記得BeautifulSoup首先有一個名爲unicode的方法，但我已經習慣BS 3.0，所以它可能在4中是新的。

來源

2012-04-25 05:09:41

Python beautifulsoup迭代表

回答

相關問題