2012-07-08 111 views
0

我寫了一個蜘蛛,需要將項目存儲在SQLite3數據庫中,但每次出現錯誤。請幫助我,因爲我現在卡住了!這是蜘蛛代碼:Scrapy SQLite3錯誤?

response_fld=response.url 
text_input=hxs.select("//input[(@id or @name) and (@type = 'text')]/@id ").extract() 
pass_input=hxs.select("//input[(@id or @name) and (@type = 'password')]/@id").extract()  
file_input=hxs.select("//input[(@id or @name) and (@type = 'file')]/@id").extract() 

輸出JSON格式:

{"pass_input": ["tbPassword"], "file_input": [], "response_fld": "http://testaspnet.vulnweb.com/Signup.aspx", "text_input": ["tbUsername"]} 
{"pass_input": [], "file_input": [], "response_fld": "http://testaspnet.vulnweb.com/default.aspx", "text_input": []} 
{"pass_input": ["tbPassword"], "file_input": [], "response_fld": "http://testaspnet.vulnweb.com/login.aspx", "text_input": ["tbUsername"]} 
{"pass_input": [], "file_input": [], "response_fld": "http://testaspnet.vulnweb.com/Comments.aspx?id=0", "text_input": []} 

管道代碼:

import sqlite3 
from os import path 

class SQLiteStorePipeline(object): 

    def __init__(self): 
     self.conn = sqlite3.connect('./project.db') 
     self.cur = self.conn.cursor() 

    def process_item(self, domain, item): 
     self.cur.execute("insert into links (link) values(item['response_fld'][0]") 
     self.cur.execute("insert into inputs (input_name) values(item['text_input'][0];") 
     self.cur.execute("insert into inputs (input_name) values(item['pass_input'][0];") 
     self.cur.execute("insert into inputs (input_name) values(item['file_input'][0];") 
     self.conn.commit() 
     return item 

    def handle_error(self, e): 
     log.err(e) 

錯誤:

File "/home/abdallah/isa/isa/pipelines.py", line 22, in process_item 
    self.cur.execute("insert into links (link) values(item['response_fld'][0]") 
sqlite3.OperationalError: near "['response_fld']": syntax error 

數據庫方案:

CREATE TABLE "Targets" ("id" INTEGER PRIMARY KEY AUTOINCREMENT, "domain" TEXT); 

CREATE TABLE "Links" ("id" INTEGER PRIMARY KEY AUTOINCREMENT, "link" TEXT, "target" INT, FOREIGN KEY (target) REFERENCES Targets(id)); 

CREATE TABLE "Input_Types" ("id" INTEGER PRIMARY KEY AUTOINCREMENT, "type" TEXT); 

CREATE TABLE "Inputs" ("id" INTEGER PRIMARY KEY AUTOINCREMENT, "input_name" TEXT, "link_id" INT, "input_type" INT, FOREIGN KEY (input_type) REFERENCES Input_Types(id)); 

回答

0

這是您的無效SQL查詢:

self.cur.execute("insert into links (link) values(item['response_fld'][0]") 

按照該docs,你應該做的:

self.cur.execute("insert into links (link) values(?)", (item['response_fld'][0],)) 
+0

self.cur.execute(「插入鏈接(鏈接)值(?)「,(item ['response_fld'] [0],)) \t exceptions.TypeError:'MySpider'對象沒有屬性'__getitem__' – 2012-07-08 17:34:48

+0

@ right.sowrd,這是一個不同的問題。你的'process_item'簽名是錯誤的。根據文檔,它應該是'def process_item(self,item,spider)'而不是'def process_item(self,domain,item)' – warvariuc 2012-07-08 18:01:03

+0

非常感謝,它適用於某些,但仍然有錯誤'first:only' h是什麼存儲在數據庫中,'當試圖存儲其他項目時,我得到了'exceptions.IndexError:列表索引超出範圍' – 2012-07-08 18:17:50