我沒有看到https://www.amazon.com
我不得不這樣做https://www.amazon.com/,以便接收HTML數據productDetailsTable。
這裏是我稍作修改的Python 3代碼。
from bs4 import BeautifulSoup
import requests
url = input("Enter a website to extract the URL's from: ")
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "lxml")
print(soup.text)
它打印頁面的html。
你會注意到亞馬遜很聰明。該html包括機器人檢查:
if (true === true) {
var ue_t0 = (+ new Date()),
ue_csm = window,
ue = { t0: ue_t0, d: function() { return (+new Date() - ue_t0); } },
ue_furl = "fls-na.amazon.com",
ue_mid = "ATVPDKIKX0DER",
ue_sid = (document.cookie.match(/session-id=([0-9-]+)/) || [])[1],
ue_sn = "opfcaptcha.amazon.com",
ue_id = 'R8D7EEN5FVS7RWC2M549';
}
Enter the characters you see below
Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.
它讓你不讀亞馬遜的頁面。你必須做更多,可能與requests,幷包括headers和cookie信息。
任何你不會僅僅使用亞馬遜API的理由? – Cfreak
正在嘗試獲取其他產品的具體產品詳細信息,這些產品在其API中無法真正訪問,但出現在他們的html頁面上:( –