scrapy python帶參數

我想實現args進入蜘蛛的url。例如：scrapy python帶參數

scrapy crawl test -a url="https://example.com"

之後，我想自動採取start_urls並將其自動轉換爲domain_allowed。例如：

domain_allowed = ['example.com']

之後，我想舉例通過剛剛字到mysql管道在那裏創建表從domain_allowed的只用一句話例

這是什麼我對現在：

class Spider(BaseSpider): name = 'seeker' def __init__(self, *args, **kwargs): urls = kwargs.pop('urls', []) if urls: self.start_urls = urls.split(',') self.logger.info(self.start_urls) # take the arg "urls" and convert it to allowed_domains url = "".join(urls) self.allowed_domains = [url.split('/')[-1]] super(SeekerSpider, self).__init__(*args, **kwargs) # i have to use "domain" here and not inside the function parge_page or __init__ domain = domain_allowed.replace(".", "_") # create folder with the domain name def parse_page(self, response): ...

基本上我需要使用self.allowed_dom ains以外的功能...這就是我的問題...變量域不接受它。

這是我pipelines.py的一部分

class MySQLPipeline(object): def __init__(self, *args, **kwargs): self.connect = pymysql.connect(...) self.cursor = self.connect.cursor() # print "Input the name of the table: " <-- its commented # tablename = raw_input(" ") <-- its commented date = datetime.datetime.now().strftime("%y_%m_%d_%H_%M") self.tablename = kwargs.pop('tbl', '') self.newname = self.tablename + "_" + date print self.newname # create a different way to create a tablename # importing the "allowed_domain" and strip it # and give tablename

管線我已經做到了這way..but其並不好......我想借此allowed_domain從蜘蛛和通在這裏，並把它分解拿域的唯一名稱，而不.COM或.whatever

預先感謝您

來源

2017-09-28 Omega

在我的手機有關格式非常抱歉......

我會使用過程中的項目函數蜘蛛對象：高清process_item（個體經營，項目，蜘蛛）：「。‘ spider.allowed_domains.replace（’ _「）

來源

2017-09-28 14:51:42 RabidCicada

是的，但它會通過它每一個請求...所以基本上它會創建它100次....我需要在外面做...如果我做** def __init__（self，蜘蛛）**而是...實際上，我已經嘗試過，它不工作...我不能做spider.allowed_domain ...它不存在 – Omega

scrapy python帶參數

回答

相關問題