使用BeautifulSoup提取表格信息（bs4）

任何人都可以給我一段BeautifulSoup代碼來提取表中找到的一些項目here？使用BeautifulSoup提取表格信息（bs4）

這裏是我的嘗試：

from bs4 import BeautifulSoup 
from urllib2 import urlopen 

url = "http://biology.burke.washington.edu/conus/accounts/../recordview/record.php?ID=1ll&tabs=21100111&frms=1&res=&pglimit=A" 

html = urlopen(url).read() 
soup = BeautifulSoup(html,"lxml") 
tables = soup.findAll("table")

然而，這是失敗的 - 表原來是空的。

對不起，我是BeautifulSoup noob。

謝謝！

來源

2013-07-26 littleO

給定的url頁面不包含源代碼中的任何表格元素。

表格是~~由iframe內的iframe~~生成。

import urllib 
from bs4 import BeautifulSoup 

url = 'http://biology.burke.washington.edu/conus/recordview/description.php?ID=1l9l0l421l55llll&tabs=21100111&frms=1&pglimit=A&offset=&res=&srt=&sql2=' 

html = urllib.urlopen(url).read() 
soup = BeautifulSoup(html) 
tables = soup.find_all('table') 
#print(tables)

硒的解決方案：

from selenium import webdriver 
from bs4 import BeautifulSoup 

url = "http://biology.burke.washington.edu/conus/accounts/../recordview/record.php?ID=1ll&tabs=21100111&frms=1&res=&pglimit=A" 

driver = webdriver.Firefox() 
driver.get(url) 
driver.switch_to_frame(driver.find_elements_by_tag_name('iframe')[0]) 
soup = BeautifulSoup(driver.page_source) 
tables = soup.find_all('table') 
#print(tables) 
driver.quit()

來源

2013-07-26 07:44:00 falsetru

好的謝謝！我沒有意識到這一點。您是否看到頂部的殼形態測量框中的信息？我將如何提取該框內的信息？ – littleO

嗯，我會考慮硒。有沒有簡單的方法使用BeautifulSoup來做到這一點？ – littleO

@littleO，我添加了一個使用硒+ bs4的代碼。 – falsetru

這是我目前的工作流程：

from bs4 import beautifulsoup 
from urllib2 import urlopen 
url = "http://somewebpage.com" 
html = urlopen(url).read() 
soup = BeautifulSoup(html) 
tables = soup.find_all('table')

來源

2013-11-26 05:46:12 0077cc

使用BeautifulSoup提取表格信息（bs4）

回答

相關問題