2014-04-11 37 views
0

我是新的Scrapy框架如何使用Item Pipeline for Scrapy在DB中存儲多個項目?

我想存儲在數據庫的一些項目使用項目管道

Spider.py

class ExampleSpider(Spider): 
    name = "Spider1" 
    allowed_domains = ["example.com"] 
    start_urls = ["http://www.example.com.com/.../rss_1.xml"] 
    def parse(self, response): 
     sel = Selector(response) 
     Examples = sel.xpath('//item') 
     items = [] 
     for Example in Examples: 
      item = ExampleItem() 
      item['link'] = Example.xpath('.//link/text()').extract() 
      item['title'] = Example.xpath('.//title/text()').extract() 
      links = item['link'] 
      titles = item['title'] 
      items.append(item) 
     return items 

pipelines.py

class MySQLStorePipeline(object): 

    def __init__(self, dbpool): 
     self.dbpool = dbpool 

    @classmethod 
    def from_settings(cls, settings): 
     dbargs = dict(
      host=settings['MYSQL_HOST'], 
      db=settings['MYSQL_DBNAME'], 
      user=settings['MYSQL_USER'], 
      passwd=settings['MYSQL_PASSWD'], 
      charset='utf8', 
      use_unicode=True, 
     ) 
     dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs) 
     return cls(dbpool) 

    def process_item(self, item, spider): 
     # run db query in the thread pool 
     query = self.dbpool.runInteraction(self._conditional_insert, item, spider) 
     query.addErrback(self._handle_error, item, spider) 
     # at the end return the item in case of success or failure 
     query.addBoth(lambda _: item) 
     # return the deferred instead the item. This makes the engine to 
     # process next item (according to CONCURRENT_ITEMS setting) after this 
     # operation (deferred) has finished. 
     return query 

    def _conditional_insert(self, tx, item, spider): 

     tx.execute("select * from AnnonceGratuit where link = %s", (item['link'])) 
     result = tx.fetchone() 
     if result: 
      log.msg("Item already stored in db: %s" % item, level=log.DEBUG) 
     else: 
      tx.execute(""" 
       INSERT INTO AnnonceGratuit (link, title) 
       VALUES (%s, %s) 
      """, (item['link'], item['title']) 
      ) 
      log.msg("Item stored in db: %s" % item, level=log.DEBUG) 

    def _handle_error(self, failure, item, spider): 
     """Handle occurred on db interaction.""" 
     # do nothing, just log 
     log.err(failure) 

我成功廢棄鏈接和標題迭代米

但是,當我盤來存儲他們...我有這樣的錯誤

_mysql_exceptions.ProgrammingError: (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \') 

NB

當我與一個項目使用相同的代碼,它的工作原理,並存儲在數據庫 但是對於兩件或更多它不起作用!

感謝您的幫助提前。

+0

如果從[item ['link']'或[item ['title']''返回空值,那麼你的SQL代碼會發生什麼? – Talvalin

+0

當我使用一個項目,我有我的數據庫中的結果 –

+0

但是,當我使用兩個項目或更多,我的表中的所有列爲空 –

回答

0

也許你應該檢查你的項目['link']或者item ['title']是一個列表還是一個字符串。 我有同樣的錯誤,因爲我嘗試將列表存儲到mysql,並將列表轉換爲字符串後,工作正常。

相關問題