-1
我是scrapy的新手。我有一個非常基本的疑問,但找不到解決方案。 我的代碼:調用包含抓取細節和解析函數的類到另一個類
import os
from boto import log
from scrapy.utils.project import get_project_settings
import scrapy
from scrapy.crawler import CrawlerProcess, Crawler
from scrapy.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.settings import Settings
from scrapy.utils import reactor
from testing.items import testingItem
from testing.spiders.MySpider1 import Spider1
from scrapy.contrib.spiders import CrawlSpider, Rule
from multiprocessing import Pool
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
class MySpider(CrawlSpider):
name = "MySpider"
a=Spider1()
a.parse()
*********上面的代碼是在一個單獨的頁面***********
import scrapy
from testing.items import testingItem
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.crawler import CrawlerProcess
from multiprocessing import Process, Queue
class Spider1():
def parse(self, response):
allowed_domains = ["dmoz.org"]
start_urls = [ "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
for sel in response.xpath('//ul/li'):
item = testingItem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
item['desc'] = sel.xpath('text()').extract()
yield item
的問題是,我想爬上面提到的網站與上述解析功能。我無法調用該函數並無錯誤地進行爬網。
如果您遇到錯誤,爲什麼不與我們分享? (編輯你的問題,並將錯誤消息添加到它) – GHajba
TypeError:parse()只需要2個參數(1給出)這是我的錯誤 – Jijo