2016-06-13 42 views
1

我不想從booking.com搜刮酒店價格 但無法弄清楚爲什麼我在使用beautifulsoup4搜索課程時返回空列表。我的代碼在這裏給出。無法從booking.com取得酒店價格

import webbrowser, requests 
from bs4 import BeautifulSoup 


res = requests.get("http://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaGyIAQGYATG4AQjIAQzYAQHoAQH4AQKoAgM&sid=c24fad210186ae699e89a0d3cab10039&dcid=4&checkin_monthday=18&checkin_year_month=2016-6&checkout_monthday=19&checkout_year_month=2016-6&class_interval=1&dest_id=-2092511&dest_type=city&group_adults=2&group_children=0&hlrd=0&label_click=undef&nflt=ht_id%3D204%3B&no_rooms=1&review_score_group=empty&room1=A%2CA&sb_price_type=total&sb_travel_purpose=business&score_min=0&src_elem=sb&ss=Kolkata%2C%20West%20Bengal%2C%20India&ss_raw=kolka&ssb=empty&order=score") 
res.status_code 
soup = BeautifulSoup(res.text,"lxml") 
name = [] 
rating = [] 

hotel_name = soup.select('.sr-hotel__name') 
hotel_price = soup.select('tr', class_='roomPrice') 
hotel_rating = soup.select('.js--hp-scorecard-scoreval') 

print hotel_price 
for i in range(0, 10): 
    name.append(hotel_name[i].contents[0]) 
    rating.append(hotel_rating[i].contents[0]) 
    #print name[i] 
    #print rating[i] 

回答

2

我必須做兩兩件事,1。增加一個用戶代理,2.改變選擇,刮當源是你所看到的實際上是不同的,當你右擊並選擇查看源代碼在瀏覽器:

In [7]: head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"} 

In [8]: url = "http://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaGyIAQGYATG4AQjIAQzYAQHoAQH4AQKoAgM&sid=c24fad210186ae699e89a0d3cab10039&dcid=4&checkin_monthday=18&checkin_year_month=2016-6&checkout_monthday=19&checkout_year_month=2016-6&class_interval=1&dest_id=-2092511&dest_type=city&group_adults=2&group_children=0&hlrd=0&label_click=undef&nflt=ht_id%3D204%3B&no_rooms=1&review_score_group=empty&room1=A%2CA&sb_price_type=total&sb_travel_purpose=business&score_min=0&src_elem=sb&ss=Kolkata%2C%20West%20Bengal%2C%20India&ss_raw=kolka&ssb=empty&order=score" 

In [9]: res = requests.get(url, headers=head) 

In [10]: soup = BeautifulSoup(res.text,"html.parser") 

In [11]: hotels = soup.select("#hotellist_inner div.sr_item.sr_item_new") 

In [12]: for hotel in hotels: 
    ....:   name = hotel.select_one("span.sr-hotel__name").text.strip() ....:   print(name) 
    ....:   score = hotel.select_one("span.average.js--hp-scorecard-scoreval") 
    ....:   print(score.text.strip()) 
    ....:   price = hotel.select_one("table div.sr-prc--num.sr-prc--final") 
    ....:   print(price.text.strip() if price else "Unavailable") 
    ....:  
The Oberoi Grand Kolkata 
9.0 
€ 113 
Taj Bengal 
9.0 
€ 113 
Sapphire Suites 
7.4 
Unavailable 
The Gateway Hotel EM Bypass Kolkata 
8.6 
€ 84 
The Lalit Great Eastern Kolkata 
8.6 
€ 101 
Swissôtel Kolkata 
8.5 
€ 86 
Kenilworth Hotel 
8.5 
€ 78 
The Fern Residency Kolkata 
8.4 
€ 84 
ITC Sonar Kolkata A Luxury Collection Hotel 
8.3 
€ 116 
Hyatt Regency 
8.3 
€ 63 
Treebo Platinum 
8.2 
€ 38 
The Corner Courtyard 
8.2 
€ 73 
Jameson Inn Shiraz 
8.0 
€ 58 
The Sonnet 
7.9 
€ 80 
Hotel Casa Fortuna 
7.9 
€ 56 
Pipal Tree Hotel 
7.9 
€ 77 

也爲您的選擇soup.select('tr', class_='roomPrice')語法不正確,這將是soup.select('tr.roomPrice')

但上面確實輸出,如果你去的頁面不按分數排序,我們需要做的是使用基礎網址,並通過PARAMS:

In [20]: params = {'checkin_year_month':'2016-6', 
    ....: 'checkout_monthday':'19', 
    ....: 'checkout_year_month':'2016-6', 
    ....: 'class_interval':'1', 
    ....: 'dest_id':'-2092511', 
    ....: 'dest_type':'city', 
    ....: 'dtdisc':'0', 
    ....: 'group_adults':'2', 
    ....: 'group_children':'0', 
    ....: 'hlrd':'0', 
    ....: 'hyb_red':'0', 
    ....: 'inac':'0', 
    ....: 'label_click':'undef', 
    ....: 'nflt':'ht_id=204;', 
    ....: 'nha_red':'0', 
    ....: 'no_rooms':'1', 
    ....: 'offset':'0', 
    ....: 'order':'score', 
    ....: 'postcard':'0', 
    ....: 'redirected_from_city':'0', 
    ....: 'redirected_from_landmark':'0', 
    ....: 'redirected_from_region':'0', 
    ....: 'review_score_group':'empty', 
    ....: 'room1':'A,A', 
    ....: 'sb_price_type':'total', 
    ....: 'sb_travel_purpose':'business', 
    ....: 'score_min':'0', 
    ....: 'src_elem':'sb', 
    ....: 'ss':'Kolkata, West Bengal, India', 
    ....: 'ss_all':'0', 
    ....: 'ss_raw':'kolka', 
    ....: 'ssb':'empty', 
    ....: 'sshis':'0'} 

In [21]: head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"} 

In [22]: url = "http://www.booking.com/searchresults.html" 

In [23]: res = requests.get(url, params=params, headers=head) 

In [24]: soup = BeautifulSoup(res.text,"html.parser") 

In [25]: hotels = soup.select("#hotellist_inner div.sr_item.sr_item_new") 

In [26]: for hotel in hotels: 
    ....:   name = hotel.select_one("span.sr-hotel__name").text.strip() ....:   print(name) 
    ....:   score = hotel.select_one("span.average.js--hp-scorecard-scoreval") 
    ....:   print(score.text.strip()) 
    ....:   price = hotel.select_one("table div.sr-prc--num.sr-prc--final") 
    ....:   print(price.text.strip() if price else "Unavailable") 
    ....:  
The Oberoi Grand Kolkata 
9.0 
Unavailable 
Taj Bengal 
9.0 
Unavailable 
The Lalit Great Eastern Kolkata 
8.6 
Unavailable 
The Gateway Hotel EM Bypass Kolkata 
8.6 
Unavailable 
Swissôtel Kolkata 
8.5 
Unavailable 
Kenilworth Hotel 
8.5 
Unavailable 
The Fern Residency Kolkata 
8.4 
Unavailable 
ITC Sonar Kolkata A Luxury Collection Hotel 
8.3 
Unavailable 
Hyatt Regency 
8.3 
Unavailable 
Treebo Platinum 
8.2 
Unavailable 
The Corner Courtyard 
8.2 
Unavailable 
Monovilla Inn 
8.1 
Unavailable 
Jameson Inn Shiraz 
8.0 
Unavailable 
The Sonnet 
7.9 
Unavailable 
Hotel Casa Fortuna 
7.9 
Unavailable 

這使使用here其中價格是隱藏的,所以我們需要添加更多的邏輯,我會稍微編輯答案。

+0

Yupp這是問題價格顯示不可用。請幫忙。 – sumitroy