我試圖在Microsoft Azure Web App上運行Scrapy或Portia。 我已經通過創建虛擬環境中安裝Scrapy:如何在Azure Web App上運行Scrapy/Portia
D:\Python27\Scripts\virtualenv.exe D:\home\Python
然後安裝Scrapy:
D:\home\Python\Scripts\pip install Scrapy
安裝似乎工作。但在執行蜘蛛返回以下輸出:
D:\home\Python\Scripts\tutorial>d:\home\python\scripts\scrapy.exe crawl example 2015-09-13 23:09:31 [scrapy] INFO: Scrapy 1.0.3 started (bot: tutorial)
2015-09-13 23:09:31 [scrapy] INFO: Optional features available: ssl, http11
2015-09-13 23:09:31 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tutorial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'}
2015-09-13 23:09:34 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
Unhandled error in Deferred:
2015-09-13 23:09:35 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "D:\home\Python\lib\site-packages\scrapy\cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "D:\home\Python\lib\site-packages\scrapy\commands\crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "D:\home\Python\lib\site-packages\scrapy\crawler.py", line 153, in crawl
d = crawler.crawl(*args, **kwargs)
File "D:\home\Python\lib\site-packages\twisted\internet\defer.py", line 1274, in unwindGenerator
return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
File "D:\home\Python\lib\site-packages\twisted\internet\defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "D:\home\Python\lib\site-packages\scrapy\crawler.py", line 71, in crawl
self.engine = self._create_engine()
File "D:\home\Python\lib\site-packages\scrapy\crawler.py", line 83, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "D:\home\Python\lib\site-packages\scrapy\core\engine.py", line 66, in __init__
self.downloader = downloader_cls(crawler)
File "D:\home\Python\lib\site-packages\scrapy\core\downloader\__init__.py", line 65, in __init__
self.handlers = DownloadHandlers(crawler)
File "D:\home\Python\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 23, in __init__
cls = load_object(clspath)
File "D:\home\Python\lib\site-packages\scrapy\utils\misc.py", line 44, in load_object
mod = import_module(module)
File "D:\Python27\Lib\importlib\__init__.py", line 37, in import_module
__import__(name)
File "D:\home\Python\lib\site-packages\scrapy\core\downloader\handlers\s3.py", line 6, in <module>
from .http import HTTPDownloadHandler
File "D:\home\Python\lib\site-packages\scrapy\core\downloader\handlers\http.py", line 5, in <module>
from .http11 import HTTP11DownloadHandler as HTTPDownloadHandler
File "D:\home\Python\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 15, in <module>
from scrapy.xlib.tx import Agent, ProxyAgent, ResponseDone, \
File "D:\home\Python\lib\site-packages\scrapy\xlib\tx\__init__.py", line 3, in <module>
from twisted.web import client
File "D:\home\Python\lib\site-packages\twisted\web\client.py", line 42, in <module>
from twisted.internet.endpoints import TCP4ClientEndpoint, SSL4ClientEndpoint
File "D:\home\Python\lib\site-packages\twisted\internet\endpoints.py", line 34, in <module>
from twisted.internet.stdio import StandardIO, PipeAddress
File "D:\home\Python\lib\site-packages\twisted\internet\stdio.py", line 30, in <module>
from twisted.internet import _win32stdio
File "D:\home\Python\lib\site-packages\twisted\internet\_win32stdio.py", line 7, in <module>
import win32api
exceptions.ImportError: No module named win32api
2015-09-13 23:09:35 [twisted] CRITICAL:
文檔http://doc.scrapy.org/en/latest/intro/install.html說我必須安裝pywin32。我不知道如何通過命令行來下載/安裝它,因爲我處於Web應用程序環境中。
甚至可以在Azure Web App上運行Scrapy或Portia,還是必須在Azure上使用完全成熟的虛擬機?
謝謝!
需要注意的是,你可以從[Scrapy雲]運行你的蜘蛛(http://scrapinghub.com)(有一個免費的計劃和免責聲明:在那裏工作)。然後,您可以使用API或直接轉儲來獲取您的數據。 –