無法使用BeautifulSoap報廢內容

-2

我是網站scraping，使用Python和BeautifulSoap。無法使用BeautifulSoap報廢內容

我得刮這個頁面。

http://www.starwoodhotels.com//sheraton/property/reviews/index.html?language=en_US&propertyID=115

在此頁面中，我已經刮掉酒店的地址成功，但是我不能刮網友評論部分

這裏是我的代碼

hotel_link = "http://www.starwoodhotels.com//sheraton/property/reviews/index.html?language=en_US&propertyID=115" 

hotel_page_html = requests.get(hotel_link,headers = header).text 
hotel_page_soup = BeautifulSoup(hotel_page_html) 

for hotel_address in hotel_page_soup.select("div#propertyAddressContainer ul#propertyAddress"): 
    print("Address: "+hotel_address.select("li")[0].text) 

print(hotel_page_soup.select("div.BVRRRatingNormalOutOf"))

，你可以看到，使用CSS選擇器div#propertyAddressContainer ul#propertyAddress，我已經得到了地址，但無法抓取User Reviews部分。

我在頁面加載時檢查了Console，但沒有看到用戶評論通過AJAX調用加載的任何內容。

那麼如何刮評論部分？

來源

2014-11-16 Umair

複製？ http://stackoverflow.com/a/5913539/2063058 http://stackoverflow.com/questions/2610112/beautifulsoup-and-mechanize-to-get-ajax-call-result – tiktok

我需要看看哪些是URl提取評論？我在頁面中搜索了HTML但找不到它。有人可以告訴我的網址？ – Umair

http://stackoverflow.com/a/5995713/2063058 – tiktok

你爲什麼這麼複雜？

就去做，

soup.find("span",{"itemprop":"aggregateRating"}).text.encode('ascii','ignore').replace('\n',' ') 

Out[]: 
Rated 3.4 out of 5by 625 reviewers.

是不是你需要什麼？

來源

2014-11-17 05:29:32

在你的情況下，它將'hotel_page_soup.find'而不是'soup.find' –

我會試一試... – Umair

@Umair你試過了嗎？ –

工作代碼

rev = hotel_page_soup.find("span", 
          { "itemprop": "aggregateRating" } 
          ).text.encode('ascii', 
              'ignore' 
              ).replace('\n', ' ') 

for total_rating_score in rev.select("span"): 
    print (total_rating_score.string)

來源

2014-11-17 14:06:26 Umair

你的答案甚至不會工作。 'rev'是一個字符串，並且不會與'rev.select'一起使用這很荒謬。你問問題，當有人回答你時，只需稍微修改一些甚至不正確的東西，然後自己發佈？ –

無法使用BeautifulSoap報廢內容

回答

相關問題