2015-06-11 34 views

回答

2

讓我嘗試基於對Scrapy Website所示的Scrapy Sample Code解釋。我把它保存在一個文件scrapy_example.py

from scrapy import Spider, Item, Field 

class Post(Item): 
    title = Field() 

class BlogSpider(Spider): 
    name, start_urls = 'blogspider', ['http://blog.scrapinghub.com'] 

    def parse(self, response): 
     return [Post(title=e.extract()) for e in response.css("h2 a::text")] 

用命令scrapy runspider scrapy_example.py會產生以下輸出執行此:

(...) 
DEBUG: Crawled (200) <GET http://blog.scrapinghub.com> (referer: None) ['partial'] 
DEBUG: Scraped from <200 http://blog.scrapinghub.com> 
    {'title': u'Using git to manage vacations in a large distributed\xa0team'} 
DEBUG: Scraped from <200 http://blog.scrapinghub.com> 
    {'title': u'Gender Inequality Across Programming\xa0Languages'} 
(...) 

Crawled表示:scrapy已下載的網頁。

Scraped意味着:scrapy已經從該網頁提取了一些數據。

URL在腳本中給出爲start_urls參數。

您的輸出必須是通過運行蜘蛛生成的。搜索蜘蛛定義的文件,你應該能夠發現URL定義的地方。