我可以在SQLite/Python中加速大型數據集操作嗎？

我在尺寸範圍內存儲在格式的SQLite數據庫文件1-5十億「盒子」對象的數據集：我可以在SQLite/Python中加速大型數據集操作嗎？

[x1,y1,z1,x2,y2,z2,box_id]

，目前我在一個Python腳本，它的操作例如：

import sqlite3 as lite 

box_data = lite.connect('boxes.db') 
cur = box_data.cursor() 
editor_cursor = box_data.cursor() 

cur.execute("SELECT * FROM boxes") 
while True: 
    row = cur.fetchone() 
    if row == None: 
     break 

    row_id = row[6] 

    x1_normalized = int(round(row[0]/smallest_box_size)) 
    y1_normalized = int(round(row[1]/smallest_box_size)) 
    z1_normalized = int(round(row[2]/smallest_box_size)) 

    x2_normalized = int(round(row[3]/smallest_box_size)) 
    y2_normalized = int(round(row[4]/smallest_box_size)) 
    z2_normalized = int(round(row[5]/smallest_box_size)) 

    editor_cursor.execute("UPDATE boxes SET x1=?,y1=?,z1=?,x2=?,y2=?,z2=? WHERE id=?",(x1_normalized,y1_normalized,z1_normalized,x2_normalized,y2_normalized,z2_normalized,row_id))

其中'最小箱子尺寸'只是一些浮子。這是一個簡單的標準化任務，基本上每個盒子座標必須從其「物理」大小轉換爲標準化的整數座標。

目前該過程需要幾個小時的量級，我想減少這個工作時間。有可能在我當前的python-SQLite過程中加速這個過程？

如何在另一個更快的數據庫程序實現這個過程也可能會有所幫助:)

來源

2014-01-29 HotDogCannon

這在這裏沒有幫助，但將來你可以直接在遊標上迭代，就像'for row in cur：'，而不必'fetchone（）'和無限循環一樣。 –

有無SQLite的完成所有的工作，而不是你的任何建議：

editor_cursor.execute(""" 
    UPDATE boxes SET x1=CAST(x1/:smallest_box_size as INTEGER), 
        y1=CAST(y1/:smallest_box_size as INTEGER), 
        z1=CAST(z1/:smallest_box_size as INTEGER), 
        x2=CAST(x2/:smallest_box_size as INTEGER), 
        y2=CAST(y2/:smallest_box_size as INTEGER), 
        z2=CAST(z2/:smallest_box_size as INTEGER)""", 
    {'smallest_box_size': smallest_box_size})

換句話說，SQLite是完全有能力爲您規範化所有行，而無需通過Python對其進行管道化。

CAST到INTEGER將已經舍入一個REAL的值，不需要在這裏添加一個明確的round()調用。

爲了將來的參考：您可以遍歷遊標遍歷結果集。不需要調用.fetchone()每一行：

cur.execute("SELECT * FROM boxes") 
for row in cur: 
    # loop will terminate automatically when the rows are exhausted.

這是非常有效地實現，它僅在需要時進行迭代負荷的結果。

來源

2014-01-29 11:25:57

我可以在SQLite/Python中加速大型數據集操作嗎？

回答

相關問題