2014-02-26 24 views
0

我正在寫一個命令(通過manage.py importfiles運行)在我自己寫在Django的文件存儲中的真實文件系統上導入給定的目錄結構。Django:文件名的sqlite編碼

def _handle_directory(self, directory_path, directory): 
    for root, subFolders, files in os.walk(directory_path): 
     for filename in files: 
      path = os.path.join(root, filename) 
      with open(path, 'r') as f: 
       file_wrapper = FileWrapper(f) 
       self.cnt_files += 1 
       new_file = File(directory=directory, filename=filename, 
           file=file_wrapper, uploader=self.uploader) 
       new_file.save() 

full model can be found at GitHubfull command is currently on gist.github.com available

如果您不想檢查模型:我的File類的屬性fileFileField

複製文件似乎工作,thanks to pajton。不過,我收到一個新的異常,我認爲,sqlite編碼存在問題。但我不知道如何解決它。 sys.getfilesystemencoding()的值是mbcs

Traceback (most recent call last): 
    File ".\manage.py", line 10, in <module> 
    execute_from_command_line(sys.argv) 
    File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 399, in execute_from_command_line 
    utility.execute() 
    File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 392, in execute 
    self.fetch_command(subcommand).run_from_argv(self.argv) 
    File "C:\Python27\lib\site-packages\django\core\management\base.py", line 242, in run_from_argv 
    self.execute(*args, **options.__dict__) 
    File "C:\Python27\lib\site-packages\django\core\management\base.py", line 285, in execute 
    output = self.handle(*args, **options) 
    File "D:\Development\github\Palco\engine\filestorage\management\commands\importfiles.py", line 63, in handle 
    self._handle_directory(args[0], root) 
    File "D:\Development\github\Palco\engine\filestorage\management\commands\importfiles.py", line 75, in _handle_directory 
    new_file.save() 
    File "D:\Development\github\Palco\engine\filestorage\models.py", line 155, in save 
    super(File, self).save(*args, **kwargs) 
    File "C:\Python27\lib\site-packages\django\db\models\base.py", line 545, in save 
    force_update=force_update, update_fields=update_fields) 
    File "C:\Python27\lib\site-packages\django\db\models\base.py", line 573, in save_base 
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields) 
    File "C:\Python27\lib\site-packages\django\db\models\base.py", line 635, in _save_table 
    forced_update) 
    File "C:\Python27\lib\site-packages\django\db\models\base.py", line 679, in _do_update 
    return filtered._update(values) > 0 
    File "C:\Python27\lib\site-packages\django\db\models\query.py", line 507, in _update 
    return query.get_compiler(self.db).execute_sql(None) 
    File "C:\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 976, in execute_sql 
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type) 
    File "C:\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 782, in execute_sql 
    cursor.execute(sql, params) 
    File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 69, in execute 
    return super(CursorDebugWrapper, self).execute(sql, params) 
    File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 53, in execute 
    return self.cursor.execute(sql, params) 
    File "C:\Python27\lib\site-packages\django\db\utils.py", line 99, in __exit__ 
    six.reraise(dj_exc_type, dj_exc_value, traceback) 
    File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 53, in execute 
    return self.cursor.execute(sql, params) 
    File "C:\Python27\lib\site-packages\django\db\backends\sqlite3\base.py", line 450, in execute 
    return Database.Cursor.execute(self, query, params) 
django.db.utils.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str 
). It is highly recommended that you instead just switch your application to Unicode strings. 

我以幾種方式更換了filename;但它總是錯誤的。我也嘗試過像'foo'u'foo'這樣的值。 :(而且不同的.encode().decode()unidecode組合。

我敢肯定,這是一個問題與filename。我打印文件名的當前值,如果文件名中包含非ASCII字符,則會出現異常。

更新1:我跟隨了pajton的建議並記錄了sql查詢,結果如下: (第一行是打印文件名的輸出)D:\ temp \ prak-gdv-abgabe是我的參數命令。

Eigene L÷sung.pdf 
(0.000) QUERY = u'BEGIN' - PARAMS =(); args=None 
(0.000) QUERY = u'INSERT INTO "filestorage_file" ("directory_id", "filename", "file", "size", "content_type", "uploader_id", "datetime", "sha512") VALUES (%s, % 
s, %s, %s, %s, %s, %s, %s)' - PARAMS = (164, u'Eigene L\xf6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', None, None, 8, u'2014-02-26 
23:21:17.735000', None); args=[164, 'Eigene L\xc3\xb6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', None, None, 8, u'2014-02-26 23:21: 
17.735000', None] 
(0.000) QUERY = u'BEGIN' - PARAMS =(); args=None 
(0.000) QUERY = u'UPDATE "filestorage_file" SET "directory_id" = %s, "filename" = %s, "file" = %s, "size" = NULL, "content_type" = %s, "uploader_id" = %s, "date 
time" = %s, "sha512" = NULL WHERE "filestorage_file"."id" = %s ' - PARAMS = (164, u'D:\\Temp\\prak-gdv-abgabe\\Protokoll\\Eigene L\ufffdsung.pdf', u'filestorage 
/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', u'application/pdf', 8, u'2014-02-26 23:21:17.735000', 156); args=(164, 'D:\\Temp\\prak-gdv-abgabe\\Protokoll\\E 
igene L\xf6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', 'application/pdf', 8, u'2014-02-26 23:21:17.735000', 156) 

更新2:(2014-02-27 11:10 UTC) 我的sqlite數據庫的編碼爲UTF-8,經PRAGMA encoding;驗證。

我檢查了我的數據庫的記錄。

Id | filename          | sha512  | size 
    1 | D:\Temp\prak-gdv-abgabe\Liesmich.html   | ffeb8c3d5 | 5927 
    2 | D:\Temp\prak-gdv-abgabe\Liesmich.md    | d206d241f | 407 
    3 | D:\Temp\prak-gdv-abgabe\Liesmich.txt   | d206d241f | 407 
    4 | D:\Temp\prak-gdv-abgabe\Linux\GDV_Praktikum.bin | 5fc5749ee | 166925 
    5 | Eigene Lösung.pdf        |    | 

這是非常interessting,發生故障的條目(ID 5)具有預期的文件名但不是SHA512或設置大小值。其他條目具有sha512和大小的期望值,但不是期望的文件名。這很有趣。看來,custom save()-method of my File class是我的問題的一部分....但我不明白爲什麼會發生這些奇怪的事情。

+1

你可以發佈失敗的SQL嗎?檢查這個問題(第二回答),看看如何記錄SQLs:http://stackoverflow.com/questions/2314920/django-show-log-orm-sql-calls-from-python-shell – pajton

+0

謝謝你的建議。我添加了您要求的信息。另外,我將文件編碼添加到我的questnion('mbcs')中。 – tjati

+0

我用更多的信息更新了我的問題,這可能有助於瞭解此處發生的情況。 – tjati

回答

0

那麼,我找到一個....解決方案。我剛剛改進了我的定製.save()-我的File型號的方法。它不再發射3次以上的豁免,而是一次。而且 - 這是重要的更改 - 它只更新我檢查自定義保存方法的三個字段。我的保存方法現在看起來像:

def save(self, *args, **kwargs): 
    super(File, self).save(*args, **kwargs) 
    do_update = False 
    if not self.content_type: 
     self.content_type = mimetypes.guess_type(self.file.name)[0] 
     do_update = True 
    if not self.sha512: 
     self.sha512 = hashlib.sha512(self.file.read()).hexdigest() 
     do_update = True 
    if not self.size: 
     self.size = self.file.size 
     do_update = True 

    if do_update: 
     super(File, self).save(update_fields=['content_type', 'sha512', 'size'], *args, **kwargs) 

現在文件按預期導入!

+1

很高興你做到了!我只需要調用'super(File,self).save()'調用,並且只有一個這樣的調用,這也可以起作用 – pajton

+0

我需要第一個'超級調用',因爲如果文件是新的,我需要保存它以便運行文件操作('guess_type'和'sha512')。我知道,有多個'save()'調用是非常難看的,但在這種情況下,它似乎是必需的。 ,我的解決方案並不是真正的解決方案,但它提高了代碼質量,現在我不再有這個問題了,這是工作的,這很好;) – tjati

+1

這有點奇怪,因爲文件應該可用不管你是否已經保存...但也許它需要以不同的方式訪問。無論如何,如果你打算用這種方式來保持邏輯,你也可以看看信號。這個更新邏輯聽起來像是使用post_save信號的好地方。 – pajton