Scrapy - 編碼問題 - 刮出報價

我有這個類：Scrapy - 編碼問題 - 刮出報價

class PitchforkTracks(scrapy.Spider): 
    name = "pitchfork_tracks" 
    allowed_domains = ["pitchfork.com"] 
    start_urls = [ 
        "http://pitchfork.com/reviews/best/tracks/?page=1", 
        "http://pitchfork.com/reviews/best/tracks/?page=2", 
        "http://pitchfork.com/reviews/best/tracks/?page=3", 
        "http://pitchfork.com/reviews/best/tracks/?page=4", 
        "http://pitchfork.com/reviews/best/tracks/?page=5", 
    ] 
    def parse(self, response): 

     for sel in response.xpath('//div[@class="track-details"]/div[@class="row"]'): 
      item = PitchforkItem() 
      item['artist'] = sel.xpath('.//li/text()').extract_first() 
      item['track'] = sel.xpath('.//h2[@class="title"]/text()').extract_first() 
      yield item

刮這個項目：

<h2 class="title" data-reactid="...>「Colours」</h2>

的結果，但是，打印這樣的：

{'artist': u'The Avalanches', 'track': u'\u201cColours\u201d'}

在哪裏以及如何去掉quotes，即\u201c和\u201d？

來源

2016-09-30 data_garden

你試過http://stackoverflow.com/questions/15321138/removing-unicode-u2026-like-characters-in-a-string-in-python2- 7？ – Ben

@Ben如果我寫道：'item ['track'] = item ['track']。decode（'unicode_escape'）。encode（'ascii'，'ignore'）'我得到這個回溯：'UnicodeEncodeError：'ascii '編解碼器不能編碼字符u'\ u201c'在位置0：序號不在範圍（128）'中 –

裏面parse(self, response)，添加：

item['track'] = sel.xpath('.//h2[@class="title"]/text()').extract_first().strip(u'\u201c\u201d')

來源

2016-09-30 02:11:23

Scrapy - 編碼問題 - 刮出報價

回答

相關問題