2
我是新來抓python3網站。我想刮掉在迪拜所有酒店的評論,但問題是我只能刮我在網址中描述的酒店評論。任何人都可以告訴我如何獲得所有的酒店評論,而不隱含給每個酒店的網址?審查刮形式tripadvisor
import requests
from bs4 import BeautifulSoup
importurl = 'https://www.tripadvisor.com/Hotel_Review-g295424-d302778-Reviews-Roda_Al_Bustan_Dubai_Airport-Dubai_Emirate_of_Dubai.html'
r = requests.get(importurl)
soup = BeautifulSoup(r.content, "lxml")
resultsoup = soup.find_all("p", {"class" : "partial_entry"})
#save the reviews to a test text file locally
for review in resultsoup:
review_list = review.get_text()
print(review_list)
with open('testreview.txt', 'w') as fid:
for review in resultsoup:
review_list = review.get_text()
fid.write(review_list)
這不是酒店的完整列表,但是從第一頁只有酒店:有18頁的詳細.. – Andersson
@Andersson這是一個例子,如果你能得到1頁,只需使用循環獲得18頁。 –
但結果沒有頁碼。 'URL'總是'http://www.tripadvisor.cn/ Hotels-g295424-Dubai_Emirate_of_Dubai-Hotels.html'無論它是哪一頁:1st或19th ... – Andersson