scrapy「缺少計劃在請求的URL」

這裏是我的代碼如下─scrapy「缺少計劃在請求的URL」

import scrapy 
from scrapy.http import Request 

class lyricsFetch(scrapy.Spider): 
    name = "lyricsFetch" 
    allowed_domains = ["metrolyrics.com"] 


print "\nEnter the name of the ARTIST of the song for which you want the lyrics for. Minimise the spelling mistakes, if possible." 
artist_name = raw_input('>') 

print "\nNow comes the main part. Enter the NAME of the song itself now. Again, try not to have any spelling mistakes." 
song_name = raw_input('>') 


artist_name = artist_name.replace(" ", "_") 
song_name = song_name.replace(" ","_") 
first_letter = artist_name[0] 
print artist_name 
print song_name 

start_urls = ["www.lyricsmode.com/lyrics/"+first_letter+"/"+artist_name+"/"+song_name+".html" ] 

print "\nParsing this link\t "+ str(start_urls) 

def start_requests(self): 
    yield Request("www.lyricsmode.com/feed.xml") 

def parse(self, response): 

    lyrics = response.xpath('//p[@id="lyrics_text"]/text()').extract() 

    with open ("lyrics.txt",'wb') as lyr: 
     lyr.write(str(lyrics)) 

    #yield lyrics 

    print lyrics

我得到正確的輸出，當我使用scrapy外殼，但是，每當我嘗試使用scrapy爬我得到的運行腳本ValueError異常。我究竟做錯了什麼？我經歷了這個網站，和其他人，並沒有任何結果。我想通過另一個問題在這裏發出請求，但它仍然無效。有什麼幫助嗎？

我traceback-

Enter the name of the ARTIST of the song for which you want the lyrics for. Minimise the spelling mistakes, if possible. 
>bullet for my valentine 

Now comes the main part. Enter the NAME of the song itself now. Again, try not to have any spelling mistakes. 
>your betrayal 
bullet_for_my_valentine 
your_betrayal 

Parsing this link  ['www.lyricsmode.com/lyrics/b/bullet_for_my_valentine/your_betrayal.html'] 
2016-01-24 19:58:25 [scrapy] INFO: Scrapy 1.0.3 started (bot: lyricsFetch) 
2016-01-24 19:58:25 [scrapy] INFO: Optional features available: ssl, http11 
2016-01-24 19:58:25 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'lyricsFetch.spiders', 'SPIDER_MODULES': ['lyricsFetch.spiders'], 'BOT_NAME': 'lyricsFetch'} 
2016-01-24 19:58:27 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState 
2016-01-24 19:58:28 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 
2016-01-24 19:58:28 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 
2016-01-24 19:58:28 [scrapy] INFO: Enabled item pipelines: 
2016-01-24 19:58:28 [scrapy] INFO: Spider opened 
2016-01-24 19:58:28 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2016-01-24 19:58:28 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 
2016-01-24 19:58:28 [scrapy] ERROR: Error while obtaining start requests 
Traceback (most recent call last): 
    File "C:\Users\Nishank\Miniconda2\lib\site-packages\scrapy\core\engine.py", line 110, in _next_request 
    request = next(slot.start_requests) 
    File "C:\Users\Nishank\Desktop\SNU\Python\lyricsFetch\lyricsFetch\spiders\lyricsFetch.py", line 26, in start_requests 
    yield Request("www.lyricsmode.com/feed.xml") 
    File "C:\Users\Nishank\Miniconda2\lib\site-packages\scrapy\http\request\__init__.py", line 24, in __init__ 
    self._set_url(url) 
    File "C:\Users\Nishank\Miniconda2\lib\site-packages\scrapy\http\request\__init__.py", line 59, in _set_url 
    raise ValueError('Missing scheme in request url: %s' % self._url) 
ValueError: Missing scheme in request url: www.lyricsmode.com/feed.xml 
2016-01-24 19:58:28 [scrapy] INFO: Closing spider (finished) 
2016-01-24 19:58:28 [scrapy] INFO: Dumping Scrapy stats: 
{'finish_reason': 'finished', 
'finish_time': datetime.datetime(2016, 1, 24, 14, 28, 28, 231000), 
'log_count/DEBUG': 1, 
'log_count/ERROR': 1, 
'log_count/INFO': 7, 
'start_time': datetime.datetime(2016, 1, 24, 14, 28, 28, 215000)} 
2016-01-24 19:58:28 [scrapy] INFO: Spider closed (finished)

來源

2016-01-24 starship9

將該計劃添加到您的網址：http：// https：//。順便說一句，你的代碼真的功能齊全嗎？ – tintin

是的，現在是整個代碼。我在哪裏添加「http：//」部分？ – starship9

正如@tintin說，你缺少的網址http方案。 Scrapy需要完全合格的URL才能處理請求。

據我所看到的，你缺少的方案：

start_urls = ["www.lyricsmode.com/lyrics/ ...

和

yield Request("www.lyricsmode.com/feed.xml")

在您從HTML內容解析URL的情況下，你應該使用urljoin，以確保您獲取完全合格的網址，例如：

next_url = response.urljoin(href)

來源

2016-01-24 18:33:17 Rolando

將http：//添加到start_urls不起作用。我根據用戶提供的輸入生成一個url，生成的URL與實際網站上的URL完全匹配。我不明白我要去哪裏錯了！ – starship9

@ starship9我假設代碼中的縮進是正確設置的。您正在重寫'start_requests'方法，因此'start_urls'值不會全部使用。 – Rolando

現在工作，謝謝！ – starship9

scrapy「缺少計劃在請求的URL」

回答

相關問題