1
這是我的測試項目樹:爲什麼我沒有得到scrapy抓取工具的任何結果?
├── test11
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ ├── basic.py
│ ├── easy.py
├── scrapy.cfg
在items.py
文件我有:
從scrapy.item進口項目,現場
class test11Item(Item):
name = Field()
price = Field()
在easy.py
文件我有:
import scrapy
import urlparse
from scrapy.loader import ItemLoader
from scrapy.loader.processors import MapCompose, Join
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from test11.items import Test11Item
class EasySpider(CrawlSpider):
name = 'easy'
allowed_domains = ['web']
start_urls = ['https://www.amazon.cn/b?ie=UTF8&node=2127529051']
rules = (
Rule(SgmlLinkExtractor(restrict_xpaths='//*[@id="pagnNextLink"]')),
Rule(SgmlLinkExtractor(restrict_xpaths='//*[contains(@class,"s-access-detail-page")]'),
callback='parse_item')
)
def parse_item(self, response):
l = ItemLoader(item = Test11Item(), response = response)
l.add_xpath('name', '//*[@id="productTitle"]/text()', MapCompose(unicode.strip))
l.add_xpath('//*[@id="priceblock_ourprice"]/text()', MapCompose(lambda i: i.replace(',', ''), float), re='[,.0-9]+')
return l.load_item()
在basic.py
文件我有:
import scrapy
import urlparse
from scrapy.loader import ItemLoader
from scrapy.loader.processors import MapCompose, Join
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from test11.items import Test11Item
class BasicSpider(scrapy.Spider):
name = 'basic'
allowed_domains = ['web']
start_urls = ['https://www.amazon.cn/b?ie=UTF8&node=2127529051']
def parse(self, response):
l = ItemLoader(item = Test11Item(), response = response)
l.add_xpath('name', '//*[@id="productTitle"]/text()', MapCompose(unicode.strip))
l.add_xpath('//*[@id="priceblock_ourprice"]/text()', MapCompose(lambda i: i.replace(',', ''), float), re='[,.0-9]+')
return l.load_item()
當我運行basic
蜘蛛(scrapy crawl basic
),我得到了我想要的結果。但是當我運行easy
蜘蛛,scrapy crawl easy
,我根本沒有結果!
我在這裏錯過了什麼?
請張貼輸出... – wind85
附:順便說一句,你不需要兩個蜘蛛。只要'EasySpider'應該就夠了。 「BasicSpider」只是構建「EasySpider」的書中的一個例子。感謝您的閱讀:) – neverlastn
@neverlastn是的,我知道。我只是比較兩個文件之間爲什麼BasicSpider工作,而EasySpider不工作(儘管兩個文件中的「allowed_domains」都設置爲「web」,這就是爲什麼我沒有意識到我應該將該部分更改爲要被抓取的域,我認爲這是所有領域的普遍詞彙,好書[我從朋友那裏得到它,而且我仍然在第3章] :) – XO39