我有一個scrapy項目,其蜘蛛如下所示。當我運行這個蜘蛛用這個命令蜘蛛的工作原理:scrapy crawl myspider
scrapyd連接到它自己的數據庫(mysql.db)而不是127.0.01:3306
class MySpider(BaseSpider):
name = "myspider"
def parse(self, response):
links = SgmlLinkExtractor().extract_links(response)
for link in links:
item = QuestionItem()
item['url'] = link
yield item
def __init__(self):
start_urls = []
conn = MySQLdb.connect(host='127.0.0.1',
user='root',
passwd='xxxx',
db='myspider',
port=3306)
cur = conn.cursor()
cur.execute("SELECT * FROM pages")
rows = cur.fetchall()
for row in rows:
start_urls.append(row[0])
self.start_urls = start_urls
conn. close()
當我部署這個項目到scrapyd「scrapy部署-p mysqlproject」,然後安排與"curl http://localhost:6800/schedule.json -d project=mysql -d spider=myspider"
問題蜘蛛start_urls是沒有從數據庫中填充。相反,sql命令返回一個空數組。因爲(我猜)它連接到它自己的由dbs_dir配置的mysql.db,如下所示:http://doc.scrapy.org/en/0.14/topics/scrapyd.html#dbs-dir
如何在scrapyd和mysql服務器之間建立連接而不是mysql.db?