2013-03-10 18 views
4

我試圖在我的pipelinelines.py中導入一個django應用程序的模型以使用django orm保存數據。我在第一個涉及的django應用程序「app1」中創建了scrapy項目scrapy_project(這是否是一個好的選擇?)。 我添加這些行到我的scrapy設置文件:如何在scrapy pipelines.py文件中導入django模型

def setup_django_env(path): 
    import imp, os 
    from django.core.management import setup_environ 

    f, filename, desc = imp.find_module('settings', [path]) 
    project = imp.load_module('settings', f, filename, desc) 

    setup_environ(project) 

current_dir = os.path.abspath(os.path.dirname(os.path.dirname(__file__))) 
setup_django_env(os.path.join(current_dir, '../../d_project1')) 

當我試圖導入我的Django應用程序APP1的模型我得到這個錯誤信息:

Traceback (most recent call last): 
    File "/usr/local/bin/scrapy", line 4, in <module> 
    execute() 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 122, in execute 
    _run_print_help(parser, _run_command, cmd, args, opts) 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 76, in  _run_print_help 
    func(*a, **kw) 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 129, in  _run_command 
    cmd.run(args, opts) 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 43, in  run 
    spider = self.crawler.spiders.create(spname, **opts.spargs) 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/command.py", line 33, in crawler 
    self._crawler.configure() 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 41, in configure 
    self.engine = ExecutionEngine(self, self._spider_closed) 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 63, in  __init__ 
    self.scraper = Scraper(crawler) 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in  __init__ 
    self.itemproc = itemproc_cls.from_crawler(crawler) 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in  from_crawler 
    return cls.from_settings(crawler.settings, crawler) 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in  from_settings 
    mwcls = load_object(clspath) 
    File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 39, in  load_object 
    raise ImportError, "Error loading object '%s': %s" % (path, e) 
ImportError: Error loading object 'scrapy_project.pipelines.storage.storage': No module   named dydict.models 

爲什麼不能scrapy訪問Django應用程序模型(鑑於app1在installed_app)?

+0

你只需要得到正確的路徑,以便你可以訪問模型 – catherine 2013-03-10 11:15:08

+0

你是什麼意思?你在說什麼setup_django_env?如果是這樣,唯一允許的路徑是django項目目錄,其中存放設置文件。在我的piplines中,我應該導入這樣的模型,我猜:form app1導入模型。對? – smarber 2013-03-10 12:08:26

+0

from app1.models import ...... – catherine 2013-03-10 12:09:39

回答

0

嘗試:

from .. models import MyModel 

OR

from ... models import MyModel 

每個點代表的位置

+0

這真的很奇怪,我不明白。爲什麼這麼難! 不起作用,我可以導入的所有東西都是django項目目錄中的模型文件。我甚至在django應用程序的同一級別上創建了scrapy項目,但我無法導入我的應用程序模型。 – smarber 2013-03-13 20:53:56

+0

因爲你把scrapy項目放在dydict app – catherine 2013-03-13 23:57:26

+0

裏,我甚至在django應用程序的同一Hierarchical級別上創建了一個scrapy項目,我無法導入我的應用程序模型(同上:'()。 – smarber 2013-03-14 09:17:08

0

在不導入Django模型的管道,您使用scrapy模型界的Django模型。 您必須在scrapy設置中添加Django設置,而不是之後。

使用Django模型scrapy項目,你必須使用django_Item https://github.com/scrapy-plugins/scrapy-djangoitem(導入到您的PYTHONPATH)

我推薦的文件結構是:

Projects 
|-DjangoScrapy 
    |-DjangoProject 
    |  |-Djangoproject 
    |  |-DjangoAPP 
    |-ScrapyProject 
      |-ScrapyProject 
       |-Spiders 

然後在你的scrapy項目你HACE到添加pythonpath到django項目的路徑

**# Setting up django's project full path.** 
import sys 
sys.path.insert(0, '/home/PycharmProject/scrap/DjangoProject') 

# Setting up django's settings module name. 
import os 
os.environ['DJANGO_SETTINGS_MODULE'] = 'DjangoProject.settings' 

然後在你的items.py您CAND束縛你的Django模型scrapy型號:

spider.py:

from DjangoProject.models import Person, Job 
from scrapy_djangoitem import DjangoItem 

class Person(DjangoItem): 
    django_model = Person 
class Job(DjangoItem): 
    django_model = Job 

那麼在哪裏對象的yeld後使用管道的.save()方法

from scrapy.spider import BaseSpider 
from mybot.items import PersonItem 

class ExampleSpider(BaseSpider): 
    name = "example" 
    allowed_domains = ["dmoz.org"] 
    start_urls = ['http://www.dmoz.org/World/Espa%C3%B1ol/Artes/Artesan%C3%ADa/'] 

    def parse(self, response): 
     # do stuff 
     return PersonItem(name='zartch') 

pipelines.py

from myapp.models import Person 

class MybotPipeline(object): 
    def process_item(self, item, spider): 
     obj = Person.objects.get_or_create(name=item['name']) 
     return obj 

我與最少的代碼工作的倉庫:(你只需要設置你的Django項目的scrapy設置路徑) https://github.com/Zartch/Scrapy-Django-Minimal

在: https://github.com/Zartch/Scrapy-Django-Minimal/blob/master/mybot/mybot/settings.py 您對我的Django項目的路徑更改爲DjangoProject路徑:

sys.path.insert(0, '/home/zartch/PycharmProjects/Scrapy-Django-Minimal/myweb')