2014-11-14 30 views

回答

2

當然可以。 =)

下面是一個簡單的蜘蛛,讓你開始:

import scrapy 
from goose import Goose 

class Article(scrapy.Item): 
    title = scrapy.Field() 
    text = scrapy.Field() 

class MyGooseSpider(scrapy.Spider): 
    name = 'goose' 
    start_urls = [ 
     'http://blog.scrapinghub.com/2014/06/18/extracting-schema-org-microdata-using-scrapy-selectors-and-xpath/', 
     'http://blog.scrapinghub.com/2014/07/17/xpath-tips-from-the-web-scraping-trenches/', 
    ] 

    def parse(self, response): 
     article = Goose().extract(raw_html=response.body) 
     yield Article(title=article.title, text=article.cleaned_text) 

file.py將這個並運行:

scrapy runspider file.py -o output.json 
+0

很漂亮,謝謝。 – yayu 2014-11-15 01:29:05

相關問題