2015-07-11 96 views
1

我有幾個文件正在被解析並加載到django 1.7.7數據庫中。 下面是它的要點:django - 加速創建模型對象

# models.py 
class Bookstore(models.Model): 
    name = models.CharField(max_length=20) 
    def __unicode__(self): 
     return self.name 

class Book(models.Model): 
    store = models.ForeignKey(Bookstore) 
    title = models.CharField(max_length=20) 
    def __unicode__(self): 
     return str(self.store) 

# the code for writing to the db: 
class Command(BaseCommand): 
    def handle(self, *args, **options): 
     for i in range(100): 
      bs = Bookstore.objects.create(name='x') 
      for j in range(10): 
       print 'creating...' 
       Book.objects.create(title='hi', store=bs) 

的問題是,實際內容是大,需要花費50分鐘將文件加載到數據庫。 我該如何加快速度?

我試着用這個代碼並行的:使用的是Postgres數據庫具有線程安全的寫操作

from multiprocessing import Pool 
from functools import partial 

def create_books(store): 
    for j in range(100): 
     print 'creating...' 
     Book.objects.create(title='hi', store=store) 


class Command(BaseCommand): 
    def handle(self, *args, **options): 
     stores = [] 
     for i in range(2): 
      stores.append(Bookstore.objects.create(name='x')) 
     pool = Pool(processes=2) 
     func = partial(create_books) 
     data = pool.map(func, stores) 
     pool.close() 
     pool.join() 

。 我得到這個錯誤:

Traceback (most recent call last): 
    File "manage.py", line 10, in <module> 
    execute_from_command_line(sys.argv) 
    File "python2.7/site-packages/django/core/management/__init__.py", line 385, in execute_from_command_line 
    utility.execute() 
    File "python2.7/site-packages/django/core/management/__init__.py", line 377, in execute 
    self.fetch_command(subcommand).run_from_argv(self.argv) 
    File "python2.7/site-packages/django/core/management/base.py", line 288, in run_from_argv 
    self.execute(*args, **options.__dict__) 
    File "python2.7/site-packages/django/core/management/base.py", line 338, in execute 
    output = self.handle(*args, **options) 
    File "~django_sample_parallel_create/myapp/myapp/management/commands/parse.py", line 20, in handle 
    data = pool.map(func, stores) 
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map 
    return self.map_async(func, iterable, chunksize).get() 
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get 
    raise self._value 
django.db.utils.DatabaseError: error with no message from the libpq 

我也試過bulk_create:

class Command(BaseCommand): 
    def handle(self, *args, **options): 
     key = 1 
     for i in range(100): 
      bs = Bookstore.objects.create(name='x') 
      books = [] 
      for j in range(100): 
       books.append(Book.objects.create(pk=key, title='hi', store=bs)) 
       key += 1 
      Book.objects.bulk_create(books) 

其失敗:

django.db.utils.IntegrityError: duplicate key value violates unique constraint "myapp_book_pkey" 
DETAIL: Key (id)=(1) already exists. 

我嘗試刪除所有數據,以確保密鑰不碰撞。也試着同步了postgres鍵。 它只是失敗,似乎已經創建了所有的對象。

回答

3

嘗試與

books.append(Book(title='hi', store=bs)) 
+0

大更換

books.append(Book.objects.create(...)) 

。它正在工作,根據我的模擬樣本,bulk_create加快了20倍! – max