問題，而從Scrapy中存儲超過1個項目到Mysql

所以，我有這個問題，這使我瘋了，我試圖通過管道存儲到MySQL的scraped項目，但我無法做到這一點。問題，而從Scrapy中存儲超過1個項目到Mysql

如果我只存儲1項，我可以做到，但第二個我添加第二個項目我得到這個奇怪的錯誤。

Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '), 1)' at line 2

所以我得到上述錯誤和我pipelines.py代碼：

class DropToDb(object): 
    def __init__(self): 
     self.conn = MySQLdb.connect(host="localhost", user="root", passwd="root", db="Test") 
     self.cursor = self.conn.cursor() 

    def process_item(self, item, spider): 
     try: 
      self.cursor.execute(""" 
          INSERT INTO Main (url, domain_id) 
          VALUES (%s, %s) 
        """, (item['url'], item['domain_id'])) 

      self.conn.commit() 


     except MySQLdb.Error, e: 
      print "Error %d: %s" % (e.args[0], e.args[1]) 

     return item

，如果我刪除一個表，項目比它的偉大工程，如下圖所示。

class DropToDb(object): 
    def __init__(self): 
     self.conn = MySQLdb.connect(host="localhost", user="root", passwd="root", db="Test") 
     self.cursor = self.conn.cursor() 

    def process_item(self, item, spider): 
     try: 
      self.cursor.execute(""" 
          INSERT INTO Main (url) 
          VALUES (%s) 
        """, (item['url'])) 

      self.conn.commit() 


     except MySQLdb.Error, e: 
      print "Error %d: %s" % (e.args[0], e.args[1]) 

     return item

我的Scrapy文件看起來像：

if datematch: 
    item['link_title'] = ogtitle 
    item['link_description'] = response.xpath('//meta[@property="og:description"]/@content').extract() 
    item['link_locale'] = response.xpath('//meta[@property="og:locale"]/@content').extract(), 
    yield item

有更多的項目之上，但我只想例子。

有人能幫我擺脫這個嗎？

我的蜘蛛文件：

import scrapy 
import MySQLdb 
from MySQLdb.cursors import SSCursor 
from scrapy.http import Request 
import re 
from Maintoo.items import MaintooSpider2Item 
from scrapy.exceptions import DropItem 
import datetime 
class Maintoospider2Spider(scrapy.Spider): 
    name = "MaintooSpider2" 

    #start_urls = readdomainsfromdb() 

    def start_requests(self): 
     for domain_id, url, id_sitemap_links in readdomainsfromdb(): 
      yield Request(
       url, 
       callback=self.parse, 
       meta={ 
        'domain_id': domain_id, 
        'id_sitemap_links': id_sitemap_links 
       }, 
       errback=self.error 
      ) 

    def error(self): 
     pass 

    def parse(self, response): 
     domainid = response.meta['domain_id'] 
     id_sitemap_links = response.meta['id_sitemap_links'] 
     #updateid(id_sitemap_links) 
     ogtitle = response.xpath('//meta[@property="og:title"]/@content').extract() 
     isporn = response.xpath('//meta[@content="RTA-5042-1996-1400-1577-RTA"]').extract() 
     datematch = re.findall(r'(content="2015|2016")', response.body, re.IGNORECASE | re.DOTALL) 
     item = MaintooSpider2Item() 
     if '/tag/' in response.url: 
      raise DropItem 
     if isporn: 
      updateporn(domainid) 
      raise DropItem 

     if datematch: 
      item['link_title'] = ogtitle 
      item['link_description'] = response.xpath('//meta[@property="og:description"]/@content').extract() 
      item['link_locale'] = response.xpath('//meta[@property="og:locale"]/@content').extract() 
      item['link_type'] = response.xpath('//meta[@property="og:type"]/@content').extract() 
      item['link_url'] = response.xpath('//meta[@property="og:url"]/@content').extract() 
      item['link_site_name'] = response.xpath('//meta[@property="og:site_name"]/@content').extract() 
      item['link_article_tag'] = response.xpath('//meta[@property="article:tag"]/@content').extract() 
      item['link_article_section'] = response.xpath('//meta[@property="article:section"]/@content').extract() 
      item['link_article_published_time'] = response.xpath('//meta[@property="article:published_time"]/@content').extract() 
      item['link_meta_keywords'] = response.xpath('//meta[@name="keywords"]/@content').extract() 
      item['link_publisher'] = response.xpath('//meta[@property="article:publisher"]/@content').extract() 
      item['link_article_author'] = response.xpath('//meta[@property="article:author"]/@content').extract() 
      item['link_twitter_card'] = response.xpath('//meta[@name="twitter:card"]/@content').extract() 
      item['link_twitter_description'] = response.xpath('//meta[@name="twitter:description"]/@content').extract() 
      item['link_twitter_title'] = response.xpath('//meta[@name="twitter:title"]/@content').extract() 
      item['link_twitter_image'] = response.xpath('//meta[@name="twitter:image"]/@content').extract() 
      item['link_facebook_app_id'] = response.xpath('//meta[@property="fb:app_id"]/@content').extract() 
      item['link_facebook_page_admins'] = response.xpath('//meta[@property="fb:admins"]/@content').extract() 
      item['link_rss'] = response.xpath('//meta[@rel="alternate"]/@href').extract() 
      item['link_twitter_image_source'] = response.xpath('//meta[@name="twitter:image:src"]/@content').extract() 
      item['link_twitter_site'] = response.xpath('//meta[@name="twitter:site"]/@content').extract() 
      item['link_twitter_url'] = response.xpath('//meta[@name="twitter:url"]/@content').extract() 
      item['link_twitter_creator'] = response.xpath('//meta[@name="twitter:creator"]/@content').extract() 
      item['link_apple_app'] = response.xpath('//meta[@name="apple-itunes-app"]/@content').extract() 
      item['link_facebook_video'] = response.xpath('//meta[@property="og:video"]/@content').extract() 
      item['link_facebook_page_id'] = response.xpath('//meta[@name="fb:page_id"]/@content').extract() 
      item['link_id'] = response.xpath('//link[@rel="publisher"]/@href').extract() 
      item['link_image'] = response.xpath('//meta[@property="og:image"]/@content').extract() 
      item['url'] = response.url 
      item['domain_id'] = domainid 
      item['crawled_date'] = datetime.datetime.now().isoformat() 
      yield item

我的新管道文件：

class dropifdescription(object): 

    def process_item(self, item, spider): 

     # to test if only "job_id" is empty, 
     # change to: 
     # if not(item["job_id"]): 
     if not(item["link_title"]): 
      raise DropItem() 
     else: 
      return item 

class DropToDb(object): 
    def __init__(self): 
     self.conn = MySQLdb.connect(host="localhost", user="root", passwd="root", db="Maintoo", charset="utf8", use_unicode=True) 
     self.cursor = self.conn.cursor() 

    def process_item(self, item, spider): 
     try: 
      self.cursor.execute(""" 
           INSERT INTO Main (url, domain_id, link_title) VALUES (%s, %s, %s)""", (item['url'], item['domain_id'], item['link_title'])) 

      self.conn.commit() 


     except MySQLdb.Error, e: 
      print "Error %d: %s" % (e.args[0], e.args[1]) 

     return item

我的設置文件：

ITEM_PIPELINES = { 
    'Maintoo.pipelines.dropifdescription': 200, 
    'Maintoo.pipelines.DropToDb': 300, 
}

來源

2016-04-25 Marketingexpert

問題是從蜘蛛的內部發出。

item['link_locale'] = response.xpath('//meta[@property="og:locale"]/@content').extract(),

末見本, - 這是使你的item['link_locale']一個元組，最終傷了你的SQL查詢。刪除逗號。

而且，除此之外，您應該使用extract_first()而不是使用常規extract()來提取單個值而不是列表。

來源

2016-04-25 17:21:48 alecxe

你的答案解決了我2個項目的問題，一旦我添加了另一個問題仍然是一樣的。 def process_item（self，item，spider）：嘗試： self.cursor.execute（「」「 INSERT INTO Main（url，domain_id，link_title） VALUES（％s，％s，％s）」「」，（項目[ '網址']，項目[ '域ID']，項目[ 'LINK_TITLE'] ）） self.conn.commit（）這是怎麼回事我的代碼...這裏是問題現在呢？ – Marketingexpert

@BesnikHajredini你可以發佈你的完整蜘蛛（編輯問題並粘貼在那裏）？謝謝。 – alecxe

我添加了完整的蜘蛛文件到目前爲止，並設置文件...我希望你可以幫我在這個因爲我卡住:) – Marketingexpert

問題，而從Scrapy中存儲超過1個項目到Mysql

回答

相關問題