2017-01-31 92 views
0

我打算使用scrapy從espncricnfo網站上取消評論,並將輸出(items.csv)作爲空白。這些是我的文件。使用scrapy抓取時沒有輸出

cricinfo.py(蜘蛛文件)

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 
from crictest.items import CrictestItem 


class MySpider(BaseSpider): 
    name = "cricinfo" 
    allowed_domains = ["espncricinfo.com/"] 
    start_urls = ["http://www.espncricinfo.com/champions-league-twenty20-2014/engine/match/763595.html?innings=1;view=commentary/"] 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     rows = hxs.select('//td[@class="battingComms" and b]') 
     for row in rows: 
      item = CrictestItem() 
      item['overnum'] = row.select('b/text()').extract()[0] 
      item['overnumtext'] = row.select('b/following-sibling::text()').extract()[0] 
      yield item 

items.py

import scrapy 

    class CrictestItem(scrapy.Item): 
     overnum = scrapy.Field() 
     overnumtext = scrapy.Field() 

回答

0

的問題是你的XPath

你可以嘗試在Chrome使用此: $ x('// * [@ id =「commInnings」]/div [2]/div/div')

在代碼重寫中的代碼: 行= hxs.select(「// TD [@類=‘battingComms’和b」) 我不能得到在控制檯任何輸出