0
我想學習如何使用scrapy和python,但我不是專家......離這裏很遠。 我一直有一個空文件抓取該頁面後:product of c-discount,我不明白爲什麼...Scrapy - 抓取(200)和引用:無
這裏是我的代碼:
import scrapy
from cdiscount_test.items import CdiscountTestItem
f = open('items.csv', 'w').close()
class CdiscountsellersspiderSpider(scrapy.Spider):
name = 'CDiscountSellersSpider'
allowed_domains = ['cdiscount.com']
start_urls = ['http://www.cdiscount.com/mpv-8732-SATENCO.html']
def parse(self, response):
items = CdiscountTestItem()
name = response.xpath('//div[@class="shtName"]/div[@class="shtOver"]/h1[@itemprop="name"]/text()').extract()
country = response.xpath('//div[@class="shtName"]/span[@class="shTopCExp"]/text()').extract()
items['name_seller'] = ''.join(name).strip()
items['country_seller'] = ''.join(country).strip()
pass
而結果我在cmd窗口得到:
2017-06-20 18:01:50 [scrapy.core.engine] INFO: Spider opened
2017-06-20 18:01:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0
pages/min), scraped 0 items (at 0 items/min)
2017-06-20 18:01:50 [scrapy.extensions.telnet] DEBUG: Telnet console
listening on 127.0.0.1:6023
2017-06-20 18:01:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET
http://www.cdiscount.com/robots.txt> (referer: None)
2017-06-20 18:01:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET
http://www.cdiscount.com/mpv-8732-SATENCO.html> (referer: None)
2017-06-20 18:01:51 [scrapy.core.engine] INFO: Closing spider (finished)
有沒有人可以幫助我?
非常感謝!
好吧,我的壞...謝謝!我正在尋找一些複雜的地方,只是一個收益是有用的... –
你知道'.re()'嗎?我看到類似的東西只保留了一部分提取文本,但我不知道哪些是要放入()的參數, –