Scrapy和谷歌網頁抓取

我想利用scrapy收集谷歌搜索結果並將它們放到MongoDB中。但是，我沒有得到任何答覆......我錯過了什麼？Scrapy和谷歌網頁抓取

看起來很簡單。

# -*- coding: utf-8 -*- 
import scrapy 


class GoogleSpider(scrapy.Spider): 
    name = "google" 
    allowed_domains = ["google.com"] 
    start_urls = (
     'https://www.google.com/#q=site:www.linkedin.com%2Fpub+intext:(security+or+jsp)+and+(power+or+utility)', 
    ) 

    def parse(self, response): 
     for sel in response.xpath('//*[@id="rso"]/div/div[1]/div/h3'): 
      title = sel.xpath('a/text()').extract() 
      link = sel.xpath('a/@href').extract() 
      desc = sel.xpath('text()').extract() 
      print title, link, desc 
     pass

來源

2015-10-05 Michael Bloom

您錯過了響應沒有使用XPath請求的元素。

這是因爲您在使用Scrapy時以及使用瀏覽器時看到了另一個網站。這是因爲當您撥打start_url時，它會加載Google，然後發送XHR請求來查詢搜索。

Scrapy不發送這個XHR調用，因爲這些事情是由Scrapy不執行的JavaScript啓動的。

要查看調用此URL時scrapy得到什麼，看看你是否發現你的期望使用Scrapy殼牌：

scrapy shell "https://www.google.com/#q=site:www.linkedin.com%2Fpub+intext:(security+or+jsp)+and+(power+or+utility)"

然後出現命令提示符時，你可以看到爲什麼你沒有得到結果：

>>> response.xpath('//*[@id="rso"]/div/div[1]/div/h3') 
[] 
>>>

因此，Scrapy找不到您的XPath，因爲缺少內容。

來源

2015-10-05 11:00:59 GHajba

Scrapy和谷歌網頁抓取

回答

相關問題