2017-08-07 42 views
0

我想使用Dryscrape抓取Google的酒店房間價格結果。 例如rhs_block這裏https://www.google.co.uk/search?q=The+Taj+Mahal+Palace+hotelJS網頁抓取沒有找到某些元素

然而,它似乎不呈現,然後收集JavaScript,我想知道我可能會出錯。

import dryscrape 
from bs4 import BeautifulSoup 

dryscrape.start_xvfb() 
session = dryscrape.Session() 

my_url = 'https://www.google.ie/search?q=The+Taj+Mahal+Palace+hotel' 
session.visit(my_url) 
response = session.body() 
soup = BeautifulSoup(response, "lxml") 

# prices = soup.find('div', {"class" : "rhs_block"}) 
prices = soup.find('div', {"class" : "lhpr-content-item"}) 

print prices 

我已經在一個簡單的js渲染頁面上測試過了,所以它可以工作。任何指針將不勝感激,因爲dryscrape對我來說是相當新的。

+0

你有沒有嘗試保存的響應,並檢查你從谷歌得到了什麼? – Dekel

+0

我已經在最後設置了'print soup',然後將其作爲'python js.py >> test.html' grep -input的輸出返回任何內容。 – denski

+0

爲什麼不保存'response'並看看裏面有什麼? – Dekel

回答

1

谷歌不喜歡你用戶代理設置。在這條線:

session = dryscrape.Session() 

添加一行:

session.set_header("User-Agent", "Mozilla/5.0 (Windows NT 5.1; rv:41.0) Gecko/20100101 Firefox/41.0") 

則輸出:

<div class="lhpr-content-item" data-key="8"><div class="_qS"><a class="_dkf" data-dp="€176" data-pid="8" data-ved="0ahUKEwisoqOeisjVAhXCI1AKHWhQAm0QwDEIzAEoBDAW" href="https://www.google.com/travel/clk?pc=AA80Osxnd1Ycj04hDym-ZpFIn9a-iLsqE7UNxtLtnVS5khTT2PvlxyLaBSJZKt9V3zLJWmUBQJedYFG2CzsGB4Ru572oiGIF3i-UYsg1BBFNbDFPhXelW-FNo6lefLaSbCcPqO1W6rOEQT_ev6stedzfqyjT2Y7QnMNz5TGkr1zDWIfI6iQgV2l7mcMhzxHV7GKVjTjhX6KL-CT3c_9wBPpKVa1MICyikHUOf72incZ6e9TF1aMGcNKf6W91fdU__ZJOv3jByF7bkPQNOWM" onmousedown="return rwt(this,'','','','23','AFQjCNG0CN8A7n-gxtETpYwsGydozaH1Yg','','0ahUKEwisoqOeisjVAhXCI1AKHWhQAm0QwDEIzAEoBDAW','','',event)"><img alt="The Taj Mahal Palace" class="_Tjf" data-deferred="1" id="zemJWeyHEsLHwALooInoBguid_8" onload="google&amp;&amp;google.aft&amp;&amp;google.aft(this)" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAAAAXNSR0IArs4c6QAAAS9JREFUOBFjTJ9v7sXwn2EmIyOjDAMF4P///08YGBnSmahhGMgdYAcBHcZEqcuQPQUyiwlZgBpsuIGTYw8xTE84zsDMxEKRuXADKTIFSTNW57AwsTIk2zcxaEiZMVx+fJRhweFGhn///4J9gKQXzPz84z3DxrMzGI7e3gTmY3WhubIng4G8AwMHKxeDqZIrg7GCM7o5cD4vhyBDsGkenI/VQJBrkAE6H1kOxAZZDANYvXzy7g4GDUkTiJefHGU493A/TD1BGquBIBfNB4YbOQDDwClxh8kxB64Hw8C///7AJYlhoKdbDAPzlzgykGIoKDMgA6yxjKyAVDbVDWTMmG/xn1RXIKtnYmQGloBA+P8fWBgjDJEVE8NGT/RM4JKWGJ1EqAGZxQQqtqlhKKwKAAB1/VzfnOVWVwAAAABJRU5ErkJggg=="/><div class="_uFf"><img alt="book action chevron" class="_hEj" onload="google&amp;&amp;google.aft&amp;&amp;google.aft(this)" src="https://www.gstatic.com/images/icons/material/system/2x/chevron_right_grey600_24dp.png"/><div class="_akf"><span><span class="_bkf"><span class="_FQr"><span class="_V0p">€176</span></span></span></span></div><div class="_zbu"><span class="_Zjf">The Taj Mahal Palace</span><span class="_aMr"><span class="_bMr"> · </span><span>Official website</span></span></div></div></a></div></div> 
+0

謝謝,我相信這是你就這個問題回答我的第二個問題。你是否會對1)Python和2)WebScraping的新手有任何建議閱讀? – denski

+1

For Beautiful Soup閱讀文檔https://www.crummy.com/software/BeautifulSoup/查看Scrapy https://scrapy.org/ for Python在YouTube上有一些很好的視頻 –

+1

或免費的MOOC https:// www.edx.org/course/introduction-computer-science-mitx-6-00-1x-11僅在需要證書時付費。 –