網絡使用beautifulSoup和urllib的

我使用python 3.6和我能夠用刮文字BeautifulSoup.I刮與沃爾瑪website.I試圖從沃爾瑪刮文本練習。這是我的代碼。網絡使用beautifulSoup和urllib的

from bs4 import BeautifulSoup 
from urllib.request import urlopen 
main_page=urlopen('http://www.walmart.com/ip/Sceptre-32-Class-HD-720P-LED-TV-X322BV-SR/55427159') 
soup = BeautifulSoup(main_page,"lxml") 
title=soup.select_one("h1.prod-ProductTitle.no-margin.heading-a").get_text() 
price=soup.select_one("span.Price-group").get_text() 
highLights=soup.select_one("div.ProductPage-short-description-body").get_text() 
description=soup.select_one("div.about-desc").get_text() 
print(title,"\n",highLights,"\n",description,"\n",price)

在上面的代碼中，我提取產品名稱，價格，高燈和描述，但我不能夠提取的說明（關於這個項目）。而不是描述我得到別的東西。

請幫我解決這個問題。

來源

2017-08-30 Uzma

因爲有2個div class =「about-desc」，因爲你使用select_one只返回第一個div，但你需要第二個div。這裏的好辦法：

description=soup.select("div.about-desc")[1].get_text()

更新：該網站實際上塊的urllib的默認用戶代理，所以你應該掩蓋。

from bs4 import BeautifulSoup 
from urllib.request 
user_agent = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'} 
req = urllib.request.Request(url="http://www.walmart.com/ip/Sceptre-32-Class-HD-720P-LED-TV-X322BV-SR/55427159", headers=user_agent) 
main_page = urllib.request.urlopen(req) 
soup = BeautifulSoup(main_page,"lxml") 
title=soup.select_one("h1.prod-ProductTitle.no-margin.heading-a").get_text() 
price=soup.select_one("span.Price-group").get_text() 
highLights=soup.select_one("div.ProductPage-short-description-body").get_text() 
description=soup.select("div.about-desc")[1].get_text() 
print(title,"\n",highLights,"\n",description,"\n",price)

來源

2017-08-30 10:46:44 chad

網絡使用beautifulSoup和urllib的

回答

相關問題