我是Scrapy的開端，我遵循官方教程學習。 http://doc.scrapy.org/en/latest/intro/tutorial.html
我完全按照網站上的內容進行操作。當我嘗試在屏幕上打印網站時，沒有打印相關信息。
這裏是我的代碼：
我遵循Scrapy的教程，但沒有選定的信息被打印

items.py

class DmozIterm(scrapy.Item): 
    #define item for Dmoz 
    title=scrapy.Field() 
    link=scrapy.Field() 
    desc=scrapy.Field()

domzSpider.py

import scrapy 
class DmozSpider(scrapy.Spider): 
    name = "dmoz" 
    allowed_domains = ["dmoz.org"] 
    start_urls = [ 
     "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", 
     "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" 
    ] 

    def parse(self, response): 
     for sel in response.xpath('//ul/li'): 
      title = sel.xpath('a/text()').extract() 
      link = sel.xpath('a/@href').extract() 
      desc = sel.xpath('text()').extract() 
      print title, link, desc

最後我找到了原因，爲什麼我不能得到終端上的遺失信息：

這是我運行命令scrapy crawl dmoz的目錄不是我的Scrapy項目的根目錄。希望這個愚蠢的錯誤可以通過郵件爲其他人取消！

來源

2016-09-05 xiangang wei

Scrapy的教程是dmoz.org有點過時看到我的相關問題的答案：http://stackoverflow.com/questions/39243009/scrapy-tutorial-example/39243432＃39243432 – Granitosaurus

是的，感謝你我檢查了網站，這是真的，他們改變了html。儘管我改變了我的代碼，但終端上仍然有相同的輸出，我的意思是我沒有看到有關終端上21個站點的任何預期信息。 –

看來，網站的HTML改變，請試試這個：

for sel in response.xpath('//div[@id="site-list-content"]/div'): 
    title = sel.xpath('./div[@class="title-and-desc"]/a/div/text()').extract() 
    link = sel.xpath('./div[@class="title-and-desc"]/a/@href').extract() 
    desc = sel.xpath('.//div[contains(@class, "site-descr")]/text()').extract() 
    print title, link, desc

來源

2016-09-05 10:37:05

是的，感謝@Granitosaurus我檢查了網站，這是真的，他們改變了html。即使我用您的代碼替換了我的代碼，但終端上仍然有相同的輸出，我的意思是我沒有看到有關終端上21個站點的任何預期信息。 –

但如果我在根目錄下運行'scrapy shell'http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"'，並使用'response.xpath（'// div [@id =「站點列表內容」]/div''我能得到的信息！我覺得很奇怪！ –

在我而言它的工作原理，也許您調試使用這種從scrapy.shell進口inspect_response 高清解析（自我，響應）： inspect_response（響應，個體經營） .... 檢查它這裏查看更多詳情 https://blog.scrapinghub.com/2016/05/18/scrapy-tips-from-the-優點 - 5月 - 2016年版/ –

我遵循Scrapy的教程，但沒有選定的信息被打印

最後我找到了原因，爲什麼我不能得到終端上的遺失信息：

回答

相關問題