我開始學習scrapy。我想使用一個項目加載器並將一些數據寫入MySQL。當我在items.py中爲輸出處理器使用參數「TakeFirst()」時,下面的代碼工作得很好。但是,我需要將所有條目都寫入MySQL,而不僅僅是第一條。當我使用的說法「MapCompose()」,而不是,我得到以下MySQL相關的錯誤信息:scrapy - 物品加載器 - mysql
錯誤1241:操作數應包含1列(S)
如何需要修改我的代碼寫入MySQL的所有條目?
test_crawlspider.py:
import scrapy
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from tutorial.items import TestItem
from scrapy.loader import ItemLoader
class TestCrawlSpider(CrawlSpider):
name = "test_crawl"
allowed_domains = ["www.immobiliare.it"]
start_urls = [
"http://www.immobiliare.it/Roma/case_in_vendita-Roma.html?criterio=rilevanza"
]
rules = (
Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[@class="no-decoration button next_page_act"]',)), callback="parse_start_url", follow= True),
)
handle_httpstatus_list = [302]
def parse_start_url(self, response):
l = ItemLoader(item=TestItem(), response=response)
l.add_xpath('price', '//*/div[1]/div[1]/div[4]/strong/text()')
l.add_xpath('rooms', '//*/div[1]/div[1]/div[7]/div[1]/span[4]/text()')
return l.load_item()
items.py:
import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst, MapCompose, Join
class TestItem(scrapy.Item):
price = scrapy.Field(
output_processor=TakeFirst(),
)
rooms = scrapy.Field(
output_processor=TakeFirst(),
)
pipelines.py:
import sys
import MySQLdb
import hashlib
from scrapy.http import Request
from tutorial.items import TestItem
class MySQLPipeline(object):
def __init__(self):
self.conn = MySQLdb.connect(user='XXX', passwd='YYY', host='localhost', db='ZZZ')
self.cursor = self.conn.cursor()
def process_item(self, item, test_crawl):
print item
return item
try:
self.cursor.execute("INSERT INTO test_table (price, rooms) VALUES (%s, %s)", (item['price'], item['rooms']))
self.conn.commit()
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
return item
謝謝,我正在尋找這個解決方案。 – kanimbla