我必須做兩兩件事,1。增加一個用戶代理,2.改變選擇,刮當源是你所看到的實際上是不同的,當你右擊並選擇查看源代碼在瀏覽器:
In [7]: head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
In [8]: url = "http://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaGyIAQGYATG4AQjIAQzYAQHoAQH4AQKoAgM&sid=c24fad210186ae699e89a0d3cab10039&dcid=4&checkin_monthday=18&checkin_year_month=2016-6&checkout_monthday=19&checkout_year_month=2016-6&class_interval=1&dest_id=-2092511&dest_type=city&group_adults=2&group_children=0&hlrd=0&label_click=undef&nflt=ht_id%3D204%3B&no_rooms=1&review_score_group=empty&room1=A%2CA&sb_price_type=total&sb_travel_purpose=business&score_min=0&src_elem=sb&ss=Kolkata%2C%20West%20Bengal%2C%20India&ss_raw=kolka&ssb=empty&order=score"
In [9]: res = requests.get(url, headers=head)
In [10]: soup = BeautifulSoup(res.text,"html.parser")
In [11]: hotels = soup.select("#hotellist_inner div.sr_item.sr_item_new")
In [12]: for hotel in hotels:
....: name = hotel.select_one("span.sr-hotel__name").text.strip() ....: print(name)
....: score = hotel.select_one("span.average.js--hp-scorecard-scoreval")
....: print(score.text.strip())
....: price = hotel.select_one("table div.sr-prc--num.sr-prc--final")
....: print(price.text.strip() if price else "Unavailable")
....:
The Oberoi Grand Kolkata
9.0
€ 113
Taj Bengal
9.0
€ 113
Sapphire Suites
7.4
Unavailable
The Gateway Hotel EM Bypass Kolkata
8.6
€ 84
The Lalit Great Eastern Kolkata
8.6
€ 101
Swissôtel Kolkata
8.5
€ 86
Kenilworth Hotel
8.5
€ 78
The Fern Residency Kolkata
8.4
€ 84
ITC Sonar Kolkata A Luxury Collection Hotel
8.3
€ 116
Hyatt Regency
8.3
€ 63
Treebo Platinum
8.2
€ 38
The Corner Courtyard
8.2
€ 73
Jameson Inn Shiraz
8.0
€ 58
The Sonnet
7.9
€ 80
Hotel Casa Fortuna
7.9
€ 56
Pipal Tree Hotel
7.9
€ 77
也爲您的選擇soup.select('tr', class_='roomPrice')
語法不正確,這將是soup.select('tr.roomPrice')
。
但上面確實輸出,如果你去的頁面不按分數排序,我們需要做的是使用基礎網址,並通過PARAMS:
In [20]: params = {'checkin_year_month':'2016-6',
....: 'checkout_monthday':'19',
....: 'checkout_year_month':'2016-6',
....: 'class_interval':'1',
....: 'dest_id':'-2092511',
....: 'dest_type':'city',
....: 'dtdisc':'0',
....: 'group_adults':'2',
....: 'group_children':'0',
....: 'hlrd':'0',
....: 'hyb_red':'0',
....: 'inac':'0',
....: 'label_click':'undef',
....: 'nflt':'ht_id=204;',
....: 'nha_red':'0',
....: 'no_rooms':'1',
....: 'offset':'0',
....: 'order':'score',
....: 'postcard':'0',
....: 'redirected_from_city':'0',
....: 'redirected_from_landmark':'0',
....: 'redirected_from_region':'0',
....: 'review_score_group':'empty',
....: 'room1':'A,A',
....: 'sb_price_type':'total',
....: 'sb_travel_purpose':'business',
....: 'score_min':'0',
....: 'src_elem':'sb',
....: 'ss':'Kolkata, West Bengal, India',
....: 'ss_all':'0',
....: 'ss_raw':'kolka',
....: 'ssb':'empty',
....: 'sshis':'0'}
In [21]: head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
In [22]: url = "http://www.booking.com/searchresults.html"
In [23]: res = requests.get(url, params=params, headers=head)
In [24]: soup = BeautifulSoup(res.text,"html.parser")
In [25]: hotels = soup.select("#hotellist_inner div.sr_item.sr_item_new")
In [26]: for hotel in hotels:
....: name = hotel.select_one("span.sr-hotel__name").text.strip() ....: print(name)
....: score = hotel.select_one("span.average.js--hp-scorecard-scoreval")
....: print(score.text.strip())
....: price = hotel.select_one("table div.sr-prc--num.sr-prc--final")
....: print(price.text.strip() if price else "Unavailable")
....:
The Oberoi Grand Kolkata
9.0
Unavailable
Taj Bengal
9.0
Unavailable
The Lalit Great Eastern Kolkata
8.6
Unavailable
The Gateway Hotel EM Bypass Kolkata
8.6
Unavailable
Swissôtel Kolkata
8.5
Unavailable
Kenilworth Hotel
8.5
Unavailable
The Fern Residency Kolkata
8.4
Unavailable
ITC Sonar Kolkata A Luxury Collection Hotel
8.3
Unavailable
Hyatt Regency
8.3
Unavailable
Treebo Platinum
8.2
Unavailable
The Corner Courtyard
8.2
Unavailable
Monovilla Inn
8.1
Unavailable
Jameson Inn Shiraz
8.0
Unavailable
The Sonnet
7.9
Unavailable
Hotel Casa Fortuna
7.9
Unavailable
這使使用here其中價格是隱藏的,所以我們需要添加更多的邏輯,我會稍微編輯答案。
Yupp這是問題價格顯示不可用。請幫忙。 – sumitroy