2012-12-11 81 views
12

我一直在使用Django Haystack一段時間,這太棒了!我有一個相當繁重的網站,數據需要不時更新(15到30分鐘)。Django Haystack更新索引更快

使用python manage.py update_index時,需要大量時間來更新數據。有沒有辦法加快這一點?或者可能只更新只更改數據如果可能..

我目前使用Django Haystack 1.2.7與Solr作爲後端和Django 1.4。

謝謝!


編輯:

是的,我已經試過閱讀文檔的一部分,但我真正需要的是一個方法,以加快索引了。也許只更新最近的數據,而不是全部更新。我找到了get_updated_field,但不知道如何使用它。在文檔中只提到它被使用的原因,但沒有顯示真實的例子。


編輯2:

start = DateTimeField(model_attr='start', null=True, faceted=True, --HERE?--) 

編輯3:

好吧,我已經實現瞭解決方案波紋管,但是當我試圖rebuild_index(45000個與數據)幾乎崩潰我的電腦。之後等待一個錯誤10分鐘出現:

File "manage.py", line 10, in <module> 
    execute_from_command_line(sys.argv) 
    File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 443, in execute_from_command_line 
    utility.execute() 
    File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 382, in execute 
    self.fetch_command(subcommand).run_from_argv(self.argv) 
    File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 196, in run_from_argv 
    self.execute(*args, **options.__dict__) 
    File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 232, in execute 
    output = self.handle(*args, **options) 
    File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/rebuild_index.py", line 16, in handle 
    call_command('update_index', **options) 
    File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 150, in call_command 
    return klass.execute(*args, **defaults) 
    File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 232, in execute 
    output = self.handle(*args, **options) 
    File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 193, in handle 
    return super(Command, self).handle(*apps, **options) 
    File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 304, in handle 
    app_output = self.handle_app(app, **options) 
    File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 229, in handle_app 
    do_update(index, qs, start, end, total, self.verbosity) 
    File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 109, in do_update 
    index.backend.update(index, current_qs) 
    File "/usr/local/lib/python2.7/dist-packages/haystack/backends/solr_backend.py", line 73, in update 
    self.conn.add(docs, commit=commit, boost=index.get_field_weights()) 
    File "/usr/local/lib/python2.7/dist-packages/pysolr.py", line 686, in add 
    m = ET.tostring(message, encoding='utf-8') 
    File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1127, in tostring 
    ElementTree(element).write(file, encoding, method=method) 
    File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 821, in write 
    serialize(write, self._root, encoding, qnames, namespaces) 
    File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml 
    _serialize_xml(write, e, encoding, qnames, None) 
    File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml 
    _serialize_xml(write, e, encoding, qnames, None) 
    File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 915, in _serialize_xml 
    write("<" + tag) 
MemoryError 
+0

您是否嘗試過最佳實踐文檔中的一些內容? http://django-haystack.readthedocs.org/en/latest/best_practices.html#ref-best-practices – Spacedman

+0

我沒有使用solr後端,所以我無法幫到你,對不起。 –

回答

19

get_updated_field應該返回包含包含該模型被更新的日期(haystack docs)對模型的屬性名稱的字符串。具有auto_now = True的DateField對於那個是理想的(Django docs)。

例如,我的用戶配置模型有一個名爲更新

models.py

class UserProfile(models.Model): 
    user = models.ForeignKey(User) 
    # lots of other fields snipped 
    updated = models.DateTimeField(auto_now=True) 

search_indexes.py場

class UserProfileIndex(SearchIndex): 
    text = CharField(document=True, use_template=True) 
    user = CharField(model_attr='user') 
    user_fullname = CharField(model_attr='user__get_full_name') 

    def get_model(self): 
     return UserProfile 

    def get_updated_field(self): 
     return "updated" 

後來,當我只運行./manage.py update_index --age=10它索引過去10小時內更新的用戶配置文件。

+0

我應該在search_indexes.py中添加auto_now = True?我在上面的問題中做了一個例子。另外我應該在哪裏實現get_updated_field。感謝您的回答!! – dark4p

+0

auto_now將繼續使用models.py中的Model,函數get_updated_field將進入SearchIndex類。 –

+0

我已經添加了一個示例。 –