2012-10-23 69 views
3

我有一個加載管道但不向其傳遞項目的Scrapy項目。任何幫助表示讚賞。Scrapy管道加載但不起作用

一個精簡的蜘蛛版本:

#imports 
class MySpider(CrawlSpider): 
    #RULES AND STUFF 

    def parse_item(self, response): 
    '''Takes HTML response and turns it into an item ready for database. I hope. 
    ''' 
    #A LOT OF CODE 
    return item 

此時打印出該項目產生預期的結果,並settings.py是足夠簡單:

​​

和管道似乎是正確的(sans imports):

class MySpiderPipeline(object): 
    def process_item(self, item, spider): 
    print 'PIPELINE: got ', item['name'] 
    return item 

class DBWriter(object): 
    """Writes each item to a DB. I hope. 
    """ 
    def __init__(self): 
    self.dbpool = adbapi.ConnectionPool('MySQLdb' 
             , host=settings['HOST'] 
             , port=int(settings['PORT']) 
             , user=settings['USER'] 
             , passwd=settings['PASS'] 
             , db=settings['BASE'] 
             , cursorclass=MySQLdb.cursors.DictCursor 
             , charset='utf8' 
             , use_unicode=True 
             ) 
    print('init DBWriter') 

    def process_item(self, item, spider): 
    print 'DBWriter process_item' 
    query = self.dbpool.runInteraction(self._insert, item) 
    query.addErrback(self.handle_error) 
    return item 

    def _insert(self, tx, item): 
    print 'DBWriter _insert' 
    # A LOT OF UNRELATED CODE HERE 
    return item 

class PipeCleaner(object): 
    def __init__(self): 
    print 'Cleaning these pipes.' 

    def process_item(self, item, spider): 
    print item['name'], ' is cleeeeaaaaannn!!' 
    return item 

當我運行的蜘蛛,我得到這個輸出在啓動時:

Cleaning these pipes. 
init DBWriter 
2012-10-23 15:30:04-0400 [scrapy] DEBUG: Enabled item pipelines: MySpiderPipeline, PipeCleaner, DBWriter 

不像他們初始化條款那些打印時履帶啓動屏幕時,process_item方法不打印(或加工)什麼。我越過我的手指,忘記了一件非常簡單的事情。

+0

你可以分享一些蜘蛛實際上抓取物品時的日誌輸出嗎? –

+0

我想我已經找到了問題(部分)。蜘蛛類是基於HTML路由數據並將其發送給其他方法。這些其他方法返回該項目,但它並沒有使它在管道中。 這可能值得一個不同的問題。 – GMBill

回答

1
2012-10-23 15:30:04-0400 [scrapy] DEBUG: Enabled item pipelines: MySpiderPipeline, PipeCleaner, DBWriter 

此行顯示您的管道正在初始化並且它們正常。

問題是你的爬蟲類,

class MySpider(CrawlSpider): 
    #RULES AND STUFF 

    def parse_item(self, response): 
    '''Takes HTML response and turns it into an item ready for database. I hope. 
    ''' 
    #A LOT OF CODE 
    # before returning item , print it 
    return item 

我想你應該打印一個項目,從MySpider返回之前。