2016-04-11 76 views
0

我剛開始學習Python並面臨這個問題。試圖從亞馬遜解析價格並將其打印到控制檯。嘗試從網頁解析信息時獲取HTTPError

這是我的代碼:

import requests, bs4 

def getAmazonPrice(productUrl): 
    res = requests.get(productUrl) 
    res.raise_for_status() 

    soup = bs4.BeautifulSoup(res.text, 'html.parser') 
    elems = soup.select('#addToCart > a > h5 > div > div.a-column.a-span7.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price') 
    return elems[0].text.strip() 


price = getAmazonPrice('http://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/ref=sr_1_2?ie=UTF8&qid=1460386052&sr=8-2&keywords=python+book') 
print('The price is ' + price) 

錯誤消息:

Traceback (most recent call last): File "D:/Code/Python/Basic/webBrowser-Module.py", line 37, in price = getAmazonPrice(' http://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/ref=sr_1_2?ie=UTF8&qid=1460386052&sr=8-2&keywords=python+book ') File "D:/Code/Python/Basic/webBrowser-Module.py", line 30, in getAmazonPrice res.raise_for_status() File "C:\Python33\lib\requests\models.py", line 844, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: http://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/ref=sr_1_2?ie=UTF8&qid=1460386052&sr=8-2&keywords=python+book

Process finished with exit code 1

回答

3

假裝通過提供User-Agent頭將解決這一具體問題是一個真正的瀏覽器

res = requests.get(productUrl, headers={ 
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36" 
}) 

你還需要調整你的CSS選擇。例如,.header-price會爲您提供頁面上的所有價格(在這種情況下,非素數和素數)。

+0

現在我得到這個:IndexError:列表索引超出範圍 – Viktor

+0

@Viktor已經解決,檢查更新。 – alecxe

+0

謝謝你的工作! – Viktor