0
有無論如何,我可以控制抓取機器人,使其不會抓取我在start_urls
列表中指定的原始域以外的內容? 我試過下面是什麼,但它不會對我:(工作:?scrapy:防止抓取機器人在facebook/facebook網站中抓取鏈接
import os
from scrapy.selector import Selector
from scrapy.contrib.exporter import CsvItemExporter
from scrapy.item import Item, Field
from scrapy.settings import Settings
from scrapy.settings import default_settings
from selenium import webdriver
from urlparse import urlparse
import csv
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy import log
default_settings.DEPTH_LIMIT = 3
DOWNLOADER_MIDDLEWARES = {
'grimes2.middlewares.CustomDownloaderMiddleware': 543,
'scrapy.contrib.downloadermiddleware.redirect.RedirectMiddleware': None
}
有人可以幫我謝謝