0
我試圖從網站頁面和頁面URL(其中包含這些輸入)中提取輸入字段並將它們存儲到數據庫中...好吧
*** code works fine with no errors , but this isn't the desired output i want
scrapy sql或sqlite ...不能得到所需的輸出
蜘蛛代碼:
class MySpider(CrawlSpider):
name = 'isa_spider'
allowed_domains = ['testaspnet.vulnweb.com']
start_urls = ['http://testaspnet.vulnweb.com']
rules = (
Rule(SgmlLinkExtractor(allow=('/*')),callback='parse_item'),)
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
item=IsaItem()
item['response_fld']=response.url
res = hxs.select("//input[(@id or @name) and (@type = 'text')]/@id ").extract()
item['text_input'] = res[0] if res else None # None is default value in case no field found
res = hxs.select("//input[(@id or @name) and (@type = 'password')]/@id").extract()
item['pass_input'] = res[0] if res else None # None is default value in case no field found
res = hxs.select("//input[(@id or @name) and (@type = 'file')]/@id").extract()
item['file_input'] = res[0] if res else None # None is default value in case no field found
return item
管道代碼
class SQLiteStorePipeline(object):
def __init__(self):
self.conn = sqlite3.connect('./project.db')
self.cur = self.conn.cursor()
def process_item(self, item, spider):
self.cur.execute("insert into inputs (input_name) values(?)", (item['text_input'],))
self.cur.execute("insert into inputs (input_name) values(?)", (item['pass_input'],))
self.cur.execute("insert into inputs (input_name) values(?)", (item['file_input'],))
self.cur.execute("insert into links (link) values(?)", (item['response_fld'],))
self.conn.commit()
return item
數據庫模式picture
所需的輸出picture
(對不起,直接從我的名聲不插入圖片小於10)
@warwaru k'當前結果'[圖片](https://docs.google.com/drawings/d/10ewTKAE1ryuf0-aGqysQMp2E2BkQZBi9YpxhWkaHyaA/edit) – 2012-07-11 12:19:04