2017-05-07 89 views
0

我正在嘗試爲類項目構建一個Web刮取器。我正在使用美麗的湯。刮掉HTML元素中的「數據」或自定義參數的值

我想刮價值爲:中引入下列元素的

data-bathroom-value 

data-bedroom-value 

參數:

<td class="floorplan-bed-bath" data-bathroom-value="1" data-bedroom-value="0">Studio/1 bath</td> 

基本上試圖獲得數的值的臥室和臥室數量。

回答

2

您可以使用BeautifulSoup解析您的html,然後獲取您的標記的attribute

DEMO

>>> html_doc = '<td class="floorplan-bed-bath" data-bathroom-value="1" data-b edroom-value="0">Studio/1 bath</td>' 
>>> from bs4 import BeautifulSoup 
>>> soup = BeautifulSoup(html_doc, 'html.parser') 
>>> attrs = soup.td.attrs 
{u'data-bathroom-value': u'1', u'data-bedroom-value': u'0', u'class': [u'floorplan-bed-bath']} 
>>> attrs.get('data-bedroom-value') 
u'0' 
+0

感謝。這讓我更接近了,但我仍然不知道如何隔離數字'1'。我會繼續努力的。 – goofy564

0
from bs4 import BeautifulSoup 
import urllib2 

page = urllib2.urlopen("http://example.com/path/to/page") 
soup = BeautifulSoup(page.read()) 

for td in soup.find_all("td"): 
    if "data-bathroom-value" in td.attrs: 
     print("Bathrooms: ", td["data-bathroom-value"]) 
    if "data-bathroom-value" in td.attrs: 
     print("Bedrooms: ", td["data-bedroom-value"])