從網站刮表數據

我想從網站使用BeautifulSoup4和Python刮表數據，然後創建一個Excel文檔的結果。到目前爲止，我有這個：從網站刮表數據

import urllib2 
from bs4 import BeautifulSoup 

soup = BeautifulSoup(urllib2.urlopen('http://opl.tmhp.com/ProviderManager/SearchResults.aspx?TPI=&OfficeHrs=4&ProgType=STAR&UCCIndicator=No+Preference&Cnty=&NPI=&Srvs=6&Age=All&Gndr=B&SortBy=Distance&ZipCd=78552&SrvsOfrd=0&SpecCd=0&Name=&CntySrvd=0&Plan=H3&WvrProg=0&SubSpecCd=0&AcptPnt=Y&Rad=200&LangCd=99').read()) 

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'): 
    tds = row('td') 
    print tds[0].string, tds[1].string

但它不工作，以顯示數據。

任何想法？

來源

2013-05-26 nicholas

我無法在該頁面上看到班級'spad' - 您確定它是正確的嗎？ – scdove

首先，班級是StandardResultsGrid，而不是spad。

其次，你不需要tbody的東西。只需使用：

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr'):

還要注意，因爲在原來的頁面標題行包含在tbody出於某種原因，你必須跳過第一行，所以

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr')[1:]

並且請注意，其中一些單元格包含table s，因此您必須仔細解析td的內容。

來源

2013-05-26 19:41:53 kirelagin

從網站刮表數據

回答

相關問題