0
我想實現args進入蜘蛛的url。例如:scrapy python帶參數
scrapy crawl test -a url="https://example.com"
之後,我想自動採取start_urls並將其自動轉換爲domain_allowed。例如:
domain_allowed = ['example.com']
之後,我想舉例通過剛剛字到mysql管道在那裏創建表從domain_allowed的只用一句話例
這是什麼我對現在:
class Spider(BaseSpider):
name = 'seeker'
def __init__(self, *args, **kwargs):
urls = kwargs.pop('urls', [])
if urls:
self.start_urls = urls.split(',')
self.logger.info(self.start_urls)
# take the arg "urls" and convert it to allowed_domains
url = "".join(urls)
self.allowed_domains = [url.split('/')[-1]]
super(SeekerSpider, self).__init__(*args, **kwargs)
# i have to use "domain" here and not inside the function parge_page or __init__
domain = domain_allowed.replace(".", "_")
# create folder with the domain name
def parse_page(self, response):
...
基本上我需要使用self.allowed_dom ains以外的功能...這就是我的問題...變量域不接受它。
這是我pipelines.py的一部分
class MySQLPipeline(object):
def __init__(self, *args, **kwargs):
self.connect = pymysql.connect(...)
self.cursor = self.connect.cursor()
# print "Input the name of the table: " <-- its commented
# tablename = raw_input(" ") <-- its commented
date = datetime.datetime.now().strftime("%y_%m_%d_%H_%M")
self.tablename = kwargs.pop('tbl', '')
self.newname = self.tablename + "_" + date
print self.newname
# create a different way to create a tablename
# importing the "allowed_domain" and strip it
# and give tablename
管線我已經做到了這way..but其並不好......我想借此allowed_domain從蜘蛛和通在這裏,並把它分解拿域的唯一名稱,而不.COM或.whatever
預先感謝您
是的,但它會通過它每一個請求...所以基本上它會創建它100次....我需要在外面做...如果我做** def __init__(self,蜘蛛)**而是...實際上,我已經嘗試過,它不工作...我不能做spider.allowed_domain ...它不存在 – Omega