2017-01-07 136 views
1

我怎樣才能湊所有來自網站與動態路由Scrapy抓取網站與動態路由

http://growthtools.io/social-media-automation-tools

當我試圖

scrapy shell 'http://growthtools.io/social-media-automation-tools' 

我收到以下結果

2017-01-07 22:43:06 [root] DEBUG: Using default logger 
2017-01-07 22:43:06 [root] DEBUG: Using default logger 

In [1]: view(response) 

enter image description here

response物體沒有包含tools元素。

In [3]: In [2]: response.css('.toolsList') 
Out[3]: [] 
In [5]: 'toolsList' in response.body 
Out[5]: False 

誰能描述我如何解析http://growthtools.io/social-media-automation-tools爲什麼reponse對象我以前不包含所有頁面內容?

+0

該網站使用JavaScript來顯示頁面。你應該使用像Splash或PhantomJS這樣的無頭瀏覽器來渲染它。 –

回答

0

頁面加載涉及由Scrapy不是的瀏覽器執行的JavaScript。你可以通過scrapy-splash來解決它,它提供了一個中間件在你的Scrapy項目中使用。中間件使用您可以通過泊塢窗運行的Splash JS rendering service

就在Scrapy Shell中測試它,您可以按照this example to run it from the shell

工作對我來說:

$ scrapy shell 'http://localhost:8050/render.html?url=http://growthtools.io/social-media-automation-tools' 
In [1]: response.css('.toolsList') 
Out[1]: 
[<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>, 
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>]