如何使用Scrapy打開文件流進行閱讀？

使用Scrapy，我想使用我提取的URL將二進制文件讀入內存並提取內容。如何使用Scrapy打開文件流進行閱讀？

目前，我可以使用選擇器在頁面上找到URL，例如，

myFile = response.xpath('//a[contains(@href,".interestingfileextension")]/@href').extract()

然後我如何將該文件讀入內存，以便我可以查找該文件中的內容？

非常感謝

來源

2016-03-25 John Smith

發出請求，並探討在回調內容：

def parse(self, response): 
    url = response.xpath('//a[contains(@href,".interestingfileextension")]/@href').extract_first() 
    return scrapy.Request(url, callback=self.parse_file) 

def parse_file(self, response): 
    # response here is the contents of the file 
    print(response.body)

來源

2016-03-25 19:39:38 alecxe

完美。謝謝！ Scrapy使它變得非常簡單。 –

如何使用Scrapy打開文件流進行閱讀？

回答

相關問題