2014-10-01 61 views
2

因此,我正在爲用戶「Sri」發佈的所有「餐館點評」(而不是自己的評論的自我評論)抓取此特定網頁https://www.zomato.com/srijata打印網頁的某些文檔元素的所有發生

zomato_ind = urllib2.urlopen('https://www.zomato.com/srijata') 
zomato_info = zomato_ind.read() 
open('zomato_info.html', 'w').write(zomato_info) 
soup = BeautifulSoup(open('zomato_info.html')) 
soup.find('div','mtop0 rev-text').text 

這將打印了她的第一家餐廳的評論,即 - 「斯里蘭卡審查大草帽 - 啃這種」爲: -

u'Rated  This is situated right in the heart of the city. The items on the menu are alright and I really had to compromise for bubble tea. The tapioca was not fresh. But the latte and the soda pop my friends tried was good. Another issue which I faced was mosquitos... They almost had me.. Lol..' 

我也嘗試另一個選擇: -

我有這樣的問題, : -

如何打印下一家餐廳評論?我試過findNextSiblings等,但都沒有看起來工作。

+0

爲什麼保存在一個文件中的HTML然後將該文件讀入湯對象? – 2014-10-01 12:22:02

+0

這是我做的一項措施,以避免連續擊中網站,從而遵循安全措施,防止刮擦! – shalini 2014-10-02 05:41:56

回答

1

首先,您不需要將輸出寫入文件,將urlopen()調用的結果傳遞給BeautifulSoup構造函數。

要獲得的評論,您需要遍歷所有div標籤與rev-text類,並得到了div元素中的.next_sibling

import urllib2 
from bs4 import BeautifulSoup 

soup = BeautifulSoup(urllib2.urlopen('https://www.zomato.com/srijata')) 
for div in soup.find_all('div', class_='rev-text'): 
    print div.div.next_sibling 

打印:

This is situated right in the heart of the city. The items on the menu are alright and I really had to compromise for bubble tea. The tapioca was not fresh. But the latte and the soda pop my friends tried was good. Another issue which I faced was mosquitos... They almost had me.. Lol.. 

The ambience is good. The food quality is good. I Didn't find anything to complain. I wanted to visit the place fir a very long time and had dinner today. The meals are very good and if u want the better quality compared to other Andhra restaurants then this is the place. It's far better than nandhana. The staffs are very polite too. 

... 
+0

感謝alecxe這個工程,但我仍然試圖找出如何?就像爲什麼你只使用「rev-text」而不是「mtop0 rev-text」? – shalini 2014-10-01 14:44:19

+0

@shalini我使用過瀏覽器開發工具,檢查了幾個評論,發現他們都遵循'rev-text'類模式。那麼,肯定有很多方法可以在網頁上找到評論。您可以自由選擇適合您的任何作品,以及您認爲可靠的任何內容。謝謝。 – alecxe 2014-10-01 14:46:58

+0

亞歷克斯的問題是,在開發工具class =「mtop0 rev-text」。因此,如果在您的代碼中,我將「rev-text」替換爲「mtop0 rev-text」,它根本不打印任何內容。根據開發工具「mtop0 rev-text」也應該可以工作,,,,? – shalini 2014-10-01 14:59:26

0

你應該做一個for循環和find_all使用,而不是發現:

zomato_ind = urllib2.urlopen('https://www.zomato.com/srijata') 
zomato_info = zomato_ind.read() 
open('zomato_info.html', 'w').write(zomato_info) 
soup = BeautifulSoup(open('zomato_info.html')) 
for div in soup.find_all('div','rev-text'): 
    print div.text 

另外一個問題:爲什麼要保存在一個文件中的HTML,然後把文件讀入湯對象?

+0

does not work,print div.text ==> AttributeError:'NavigableString'對象沒有屬性'text' – shalini 2014-10-01 12:19:49

+0

抱歉試試這個。我忘記將find改成find_all – 2014-10-01 12:21:39

+0

僅在打印第一個評論後停止。 – shalini 2014-10-01 12:23:22