打印網頁的某些文檔元素的所有發生

因此，我正在爲用戶「Sri」發佈的所有「餐館點評」（而不是自己的評論的自我評論）抓取此特定網頁https://www.zomato.com/srijata。打印網頁的某些文檔元素的所有發生

zomato_ind = urllib2.urlopen('https://www.zomato.com/srijata') 
zomato_info = zomato_ind.read() 
open('zomato_info.html', 'w').write(zomato_info) 
soup = BeautifulSoup(open('zomato_info.html')) 
soup.find('div','mtop0 rev-text').text

這將打印了她的第一家餐廳的評論，即 - 「斯里蘭卡審查大草帽 - 啃這種」爲： -

u'Rated&nbsp;&nbsp;This is situated right in the heart of the city. The items on the menu are alright and I really had to compromise for bubble tea. The tapioca was not fresh. But the latte and the soda pop my friends tried was good. Another issue which I faced was mosquitos... They almost had me.. Lol..'

我也嘗試另一個選擇： -

我有這樣的問題，： -

如何打印下一家餐廳評論？我試過findNextSiblings等，但都沒有看起來工作。

來源

2014-10-01 shalini

爲什麼保存在一個文件中的HTML然後將該文件讀入湯對象？ – 2014-10-01 12:22:02

這是我做的一項措施，以避免連續擊中網站，從而遵循安全措施，防止刮擦！ – shalini 2014-10-02 05:41:56

首先，您不需要將輸出寫入文件，將urlopen()調用的結果傳遞給BeautifulSoup構造函數。

要獲得的評論，您需要遍歷所有div標籤與rev-text類，並得到了div元素中的.next_sibling：

import urllib2 
from bs4 import BeautifulSoup 

soup = BeautifulSoup(urllib2.urlopen('https://www.zomato.com/srijata')) 
for div in soup.find_all('div', class_='rev-text'): 
    print div.div.next_sibling

打印：

This is situated right in the heart of the city. The items on the menu are alright and I really had to compromise for bubble tea. The tapioca was not fresh. But the latte and the soda pop my friends tried was good. Another issue which I faced was mosquitos... They almost had me.. Lol.. 

The ambience is good. The food quality is good. I Didn't find anything to complain. I wanted to visit the place fir a very long time and had dinner today. The meals are very good and if u want the better quality compared to other Andhra restaurants then this is the place. It's far better than nandhana. The staffs are very polite too. 

...

來源

2014-10-01 13:29:59 alecxe

感謝alecxe這個工程，但我仍然試圖找出如何？就像爲什麼你只使用「rev-text」而不是「mtop0 rev-text」？ – shalini 2014-10-01 14:44:19

@shalini我使用過瀏覽器開發工具，檢查了幾個評論，發現他們都遵循'rev-text'類模式。那麼，肯定有很多方法可以在網頁上找到評論。您可以自由選擇適合您的任何作品，以及您認爲可靠的任何內容。謝謝。 – alecxe 2014-10-01 14:46:58

亞歷克斯的問題是，在開發工具class =「mtop0 rev-text」。因此，如果在您的代碼中，我將「rev-text」替換爲「mtop0 rev-text」，它根本不打印任何內容。根據開發工具「mtop0 rev-text」也應該可以工作,,,,？ – shalini 2014-10-01 14:59:26

你應該做一個for循環和find_all使用，而不是發現：

zomato_ind = urllib2.urlopen('https://www.zomato.com/srijata') 
zomato_info = zomato_ind.read() 
open('zomato_info.html', 'w').write(zomato_info) 
soup = BeautifulSoup(open('zomato_info.html')) 
for div in soup.find_all('div','rev-text'): 
    print div.text

另外一個問題：爲什麼要保存在一個文件中的HTML，然後把文件讀入湯對象？

來源

2014-10-01 12:10:36

does not work，print div.text ==> AttributeError：'NavigableString'對象沒有屬性'text' – shalini 2014-10-01 12:19:49

抱歉試試這個。我忘記將find改成find_all – 2014-10-01 12:21:39

僅在打印第一個評論後停止。 – shalini 2014-10-01 12:23:22

打印網頁的某些文檔元素的所有發生

回答

相關問題