2015-11-05 126 views
0

我正在使用scrapy爲了從網站提取數據。 當我打開json結果文件時,它總是返回空。 我scrapy代碼附:scrapy返回空json文件

from scrapy import Spider 


class StackSpider(Spider): 
    name = "stack" 
    allowed_domains = ["youtube.com"] 
    start_urls = ["https://www.youtube.com/results?search_query=Motorcycle+Accident+Stunt+Rider+Knocks+Himself+Out+Stunt+Fail+2015"] 

    def parse(self,response): 
     questions = Selector(response).xpath('//a') 
     for question in questions: 
      item = StackItem() 
      item['title'] = question.xpath(
       'a/text()').extract() 
      item['url'] = question.xpath('//@href]').extract() 
      yield item 

回答

0

我猜你刮文本元素和節點的href屬性。你只需要改變你的xpath就可以得到結果。

試試下面的代碼

item['title'] = question.xpath('./text()').extract() 
item['url'] = question.xpath('./@href]').extract() 

下面是一些輸出我得到了嘗試這些在scrapy殼

In [38]: questions = Selector(response).xpath('//a') 
In [39]: for question in questions: 
      print question.xpath('./text()').extract() 
[u'Motorcycle Accident Crash During Wheelie on the Highway Crash 2015'] 
[u'STREETFIGHTERZ'] 
[] 
[u'Motorcycle Crash Compilation 2015 || Ep.#15 of October'] 
[u'Car Crash Weekly'] 
[] 
[u'Motorcycle Accident Burnout On Highway Crash 2015'] 
[u'STREETFIGHTERZ'] 
[] 
[u'Streetfighterz Ride The Murder Biz Ride 2015 Insane Motorcycle Stunts'] 
[u'STREETFIGHTERZ'] 
In [40]: for question in questions: 
      print question.xpath('./@href').extract() 
[u'/results?filters=movie&lclk=movie&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=show&lclk=show&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=short&lclk=short&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=long&lclk=long&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=4k&lclk=4k&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=hd&lclk=hd&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=cc&lclk=cc&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=creativecommons&lclk=creativecommons&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=3d&lclk=3d&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=live&lclk=live&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=purchased&lclk=purchased&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?filters=spherical&lclk=spherical&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015'] 
[u'/results?search_sort=video_date_uploaded&search_query=Motorcycle+Accident+Stunt+Rider+Knocks+Himself+Out+Stunt+Fail+2015'] 

您的<a>節點內是已經如此,使用./選擇它裏面的元素。