0
我是新的Scrapy框架如何使用Item Pipeline for Scrapy在DB中存儲多個項目?
我想存儲在數據庫的一些項目使用項目管道
Spider.py
class ExampleSpider(Spider):
name = "Spider1"
allowed_domains = ["example.com"]
start_urls = ["http://www.example.com.com/.../rss_1.xml"]
def parse(self, response):
sel = Selector(response)
Examples = sel.xpath('//item')
items = []
for Example in Examples:
item = ExampleItem()
item['link'] = Example.xpath('.//link/text()').extract()
item['title'] = Example.xpath('.//title/text()').extract()
links = item['link']
titles = item['title']
items.append(item)
return items
pipelines.py
class MySQLStorePipeline(object):
def __init__(self, dbpool):
self.dbpool = dbpool
@classmethod
def from_settings(cls, settings):
dbargs = dict(
host=settings['MYSQL_HOST'],
db=settings['MYSQL_DBNAME'],
user=settings['MYSQL_USER'],
passwd=settings['MYSQL_PASSWD'],
charset='utf8',
use_unicode=True,
)
dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
return cls(dbpool)
def process_item(self, item, spider):
# run db query in the thread pool
query = self.dbpool.runInteraction(self._conditional_insert, item, spider)
query.addErrback(self._handle_error, item, spider)
# at the end return the item in case of success or failure
query.addBoth(lambda _: item)
# return the deferred instead the item. This makes the engine to
# process next item (according to CONCURRENT_ITEMS setting) after this
# operation (deferred) has finished.
return query
def _conditional_insert(self, tx, item, spider):
tx.execute("select * from AnnonceGratuit where link = %s", (item['link']))
result = tx.fetchone()
if result:
log.msg("Item already stored in db: %s" % item, level=log.DEBUG)
else:
tx.execute("""
INSERT INTO AnnonceGratuit (link, title)
VALUES (%s, %s)
""", (item['link'], item['title'])
)
log.msg("Item stored in db: %s" % item, level=log.DEBUG)
def _handle_error(self, failure, item, spider):
"""Handle occurred on db interaction."""
# do nothing, just log
log.err(failure)
我成功廢棄鏈接和標題迭代米
但是,當我盤來存儲他們...我有這樣的錯誤
_mysql_exceptions.ProgrammingError: (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \')
NB
當我與一個項目使用相同的代碼,它的工作原理,並存儲在數據庫 但是對於兩件或更多它不起作用!
感謝您的幫助提前。
如果從[item ['link']'或[item ['title']''返回空值,那麼你的SQL代碼會發生什麼? – Talvalin
當我使用一個項目,我有我的數據庫中的結果 –
但是,當我使用兩個項目或更多,我的表中的所有列爲空 –