2017-07-05 48 views
0

我是新來scrapy我試圖從範圍內的頁面刮日期(1,70000000) 我所使用的代碼是Scrapy START_URL錯誤

import scrapy, json, re 
from blackberry.items import BlackberryItem 
class BlackSpider(scrapy.Spider): 
    name = 'datas' 
    start_urls = [ 
       'https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482' %page for page in xrange(1, 10000000), 
       'https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482'%y for y in xrange(10000000, 20000000), 
       'https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482'%a for a in xrange(20000000, 30000000), 
       'https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482'%b for b in xrange(40000000, 50000000), 
       'https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482'%c for c in xrange(50000000, 60000000), 
       'https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482'%d for d in xrange(60000000, 70000000) 
       ] 

但我得到這個錯誤:

"y is not defined" 

回答

0

其中一個可能的解決方案如下。

import scrapy 
import json 
import re 
from blackberry.items import BlackberryItem 
class BlackSpider(scrapy.Spider): 
    name = 'datas' 
    start_urls = ['https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482' % page for page in xrange(10000000, 20000000)] 
    start_urls += ['https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482' % page for page in xrange(20000000, 30000000)] 
    start_urls += ['https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482' % page for page in xrange(30000000, 40000000)] 
    start_urls += ['https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482' % page for page in xrange(40000000, 50000000)] 
    start_urls += ['https://appworld.blackberry.com/cas/content/%s?countryid=100&lang=en&callback=_content_2360&_=1499177414482' % page for page in xrange(50000000, 60000000)] 
+0

顯示內存錯誤 – emon

+0

沒有足夠的內存。你需要從一個更小的範圍開始。 –