Python beautifulsoup抓取表

我試圖抓住表格從這個網頁。我不確定是否抓取了正確的標籤。這是我到目前爲止。Python beautifulsoup抓取表

from bs4 import BeautifulSoup 
import requests 

page='http://www.airchina.com.cn/www/en/html/index/ir/traffic/' 

r=requests.get(page) 

soup=BeautifulSoup(r.text) 

test=soup.findAll('div', {'class': 'main noneBg'}) 
rows=test.findAll("td")

是main noneBg表？當我將鼠標懸停在該標籤上時，它確實會突出顯示錶格？

來源

2014-04-02 jason

您所需要的表是在從不同的URL加載的iframe。

這裏是你如何抓住它（觀看網址是不同的）：

from bs4 import BeautifulSoup 
import requests 

page = 'http://www.airchina.com.cn/www/jsp/airlines_operating_data/exlshow_en.jsp' 

r = requests.get(page) 

soup = BeautifulSoup(r.text) 

div = soup.find('div', class_='mainRight').find_all('div')[1] 
table = div.find('table', recursive=False) 
for row in table.find_all('tr', recursive=False): 
    for cell in row('td', recursive=False): 
     print cell.text.strip()

打印：

Feb 2014 
% change vs Feb 2013 
% change vs Jan 2014 
Cumulative Feb 2014 
% cumulative change 
1.Traffic 
1.RTKs (in millions) 
1407.8 
...

請注意，您需要使用recursive=False由於在頁面上嵌套表。

來源

2014-04-02 13:14:19 alecxe

'print cell.text UnicodeEncodeError：'gbk'編解碼器無法對字符u'\ xa0'進行編碼3：非法多字節序列'在最後一行出現此錯誤。 – jason

@jason_cant_code在'cell.text'幫助文件中調用'decode（'utf-8'）'？ – alecxe

對不起，我是初學者。代碼是什麼樣的？ 'cell.text.decode（'utf-8'）。split（）'？ – jason

Python beautifulsoup抓取表

回答

相關問題