刮python booking與Python對AJAX請求

我想從booking.com抓取數據和幾乎一切正在工作，但我無法得到的價格，我讀到目前爲止，這是因爲這些價格加載通過AJAX調用。這裏是我的代碼：刮python booking與Python對AJAX請求

import requests 
import re 

from bs4 import BeautifulSoup 

url = "http://www.booking.com/searchresults.pl.html" 
payload = { 
'ss':'Warszawa', 
'si':'ai,co,ci,re,di', 
'dest_type':'city', 
'dest_id':'-534433', 
'checkin_monthday':'25', 
'checkin_year_month':'2015-10', 
'checkout_monthday':'26', 
'checkout_year_month':'2015-10', 
'sb_travel_purpose':'leisure', 
'src':'index', 
'nflt':'', 
'ss_raw':'', 
'dcid':'4' 
} 

r = requests.post(url, payload) 
html = r.content 
parsed_html = BeautifulSoup(html, "html.parser") 

print parsed_html.head.find('title').text 

tables = parsed_html.find_all("table", {"class" : "sr_item_legacy"}) 

print "Found %s records." % len(tables) 

with open("requests_results.html", "w") as f: 
    f.write(r.content) 

for table in tables: 
    name = table.find("a", {"class" : "hotel_name_link url"}) 
    average = table.find("span", {"class" : "average"}) 
    price = table.find("strong", {"class" : re.compile(r".*\bprice scarcity_color\b.*")}) 
    print name.text + " " + average.text + " " + price.text

使用Developers Tools從Chrome中我注意到，該網頁發送的所有數據（包括價格）的原始響應。在處理來自這些標籤之一的響應內容之後，有價格的原始值，所以爲什麼我無法使用腳本檢索它們，如何解決它？

來源

2015-10-21 Tomasz Kowalczyk

檢查發送的標題，可能有些東西你不會模仿你的代碼 – Marged

我已經添加了所有標題，但問題仍然存在。 –

爲什麼不使用硒作爲這種類型的動態網站？ – SIslam

的第一個問題是，該網站是非法的構造：一個div在你打開表和em被關閉。所以html.parser找不到包含價格的strong標籤。這個你可以安裝和使用lxml修復：

parsed_html = BeautifulSoup(html, "lxml")

的第二個問題是在你的正則表達式。它沒有找到任何東西。它更改爲以下：

price = table.find("strong", {"class" : re.compile(r".*\bscarcity_color\b.*")})

現在你會發現價格。但是，有些條目不包含任何價格，因此您的print語句將引發錯誤。爲了解決這個問題，你可以改變你的print以下幾點：

print name.text, average.text, price.text if price else 'No price found'

，請注意您可以單獨領域使用Python逗號（,）進行打印，所以你不需要用+ " " +將它們串聯。

來源

2015-10-22 11:08:14 GHajba

哇，非常感謝我也注意到，這個網站是不合格的（但我不知道這個術語;）），並試圖首先使用'prettify（）'，但是這也導致了其他奇怪的錯誤，所以你的解決方案非常好，最適合我。 –

刮python booking與Python對AJAX請求

回答

相關問題