2013-08-20 51 views
6

我正在寫一個腳本來SELECT查詢數據庫和解析通過〜33,000記錄。不幸的是,我遇到了問題,在cursor.fetchone()/cursor.fetchall()階段的事情。python緩慢fetchone,掛在fetchall

我第一次嘗試通過光標每次像這樣遍歷一個記錄:

# Run through every record, extract the kanji, then query for FK and weight 
printStatus("Starting weight calculations") 
while True: 
    # Get the next row in the cursor 
    row = cursor.fetchone() 
    if row == None: 
     break 

    # TODO: Determine if there's any kanji in row[2] 

    weight = float((row[3] + row[4]))/2 
    printStatus("Weight: " + str(weight)) 

基於對printStatus輸出(它打印出一個時間戳加上任何字符串傳遞給它),腳本了大約1秒來處理每一行。這導致我相信,每次迭代循環(使用LIMIT 1或某事)時,查詢都會重新運行,因爲相同的查詢在SQLiteStudio [i]和[/我]返回所有33,000行。我計算出,以這樣的速度,通過所有33,000條記錄需要大約7個小時。

而是坐在通過的,我試圖用cursor.fetchall()代替:

results = cursor.fetchall() 

# Run through every record, extract the kanji, then query for FK and weight 
printStatus("Starting weight calculations") 
for row in results: 
    # TODO: Determine if there's any kanji in row[2] 

    weight = float((row[3] + row[4]))/2 
    printStatus("Weight: " + str(weight)) 

不幸的是,Python的可執行文件在25%的CPU鎖起來了〜內存6MB,當它到了cursor.fetchall()線。我離開腳本運行了大約10分鐘,但沒有發生任何事情。

是否有〜33,000個返回的行(大約5MB的數據)太多以至於Python無法一次抓取?我堅持每次迭代一個?還是有什麼我可以做的,以加快速度?

編輯:下面是一些控制檯輸出

12:56:26.019: Adding new column 'weight' and related index to r_ele 
12:56:26.019: Querying database 
12:56:28.079: Starting weight calculations 
12:56:28.079: Weight: 1.0 
12:56:28.079: Weight: 0.5 
12:56:28.080: Weight: 0.5 
12:56:28.338: Weight: 1.0 
12:56:28.339: Weight: 3.0 
12:56:28.843: Weight: 1.5 
12:56:28.844: Weight: 1.0 
12:56:28.844: Weight: 0.5 
12:56:28.844: Weight: 0.5 
12:56:28.845: Weight: 0.5 
12:56:29.351: Weight: 0.5 
12:56:29.855: Weight: 0.5 
12:56:29.856: Weight: 1.0 
12:56:30.371: Weight: 0.5 
12:56:30.885: Weight: 0.5 
12:56:31.146: Weight: 0.5 
12:56:31.650: Weight: 1.0 
12:56:32.432: Weight: 0.5 
12:56:32.951: Weight: 0.5 
12:56:32.951: Weight: 0.5 
12:56:32.952: Weight: 1.0 
12:56:33.454: Weight: 0.5 
12:56:33.455: Weight: 0.5 
12:56:33.455: Weight: 1.0 
12:56:33.716: Weight: 0.5 
12:56:33.716: Weight: 1.0 

而這裏的SQL查詢:

0 0 0 SCAN TABLE r_ele AS re USING COVERING INDEX r_ele_fk (~500000 rows) 
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1 
1 0 0 SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows) 
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 2 
2 0 0 SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows) 
2 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 3 
3 0 0 SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows) 
3 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 4 
4 0 0 SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows) 
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 5 
5 0 0 SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows) 
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 6 
6 0 0 SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows) 
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 7 
7 0 0 SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows) 
7 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 8 
8 0 0 SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows) 
8 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 9 
9 0 0 SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows) 
+7

您是否嘗試過遍歷遊標:'遊標中的行:...'? –

+0

'fetchone'(或迭代)不會導致它每次都重新運行查詢。 「遊標」對象通常甚至不知道它運行的查詢。所以,不管你的問題是什麼,那不是。 – abarnert

+0

另外,作爲一個方面說明:使用'如果行是None:',而不是'if row == None:'。在大多數情況下,它並沒有真正的區別,但它更具慣用性(它也更快一些,在極少數情況下,當它做出改變時它會成爲你想要的)。 – abarnert

回答

3

SQLite的單位計算:

//...snip (it wasn't the culprit)... 

從SQLiteStudio EXPLAIN的查詢計劃輸出飛行結果記錄。 fetchone比較慢,因爲它必須對r_ele中的每個記錄執行所有子查詢。 fetchall甚至更​​慢,因爲只要您爲所有記錄執行了fetchone

SQLite 3.7.13估計value列上的所有查找都會非常慢,因此會爲此查詢創建一個臨時索引。 您應該創建一個永久索引,以便它可以通過SQLite的3.6.21使用:

CREATE INDEX idx_k_ele_value ON k_ele(value); 

如果沒有幫助,更新爲Python用較新版本的SQLite,或使用其他數據庫庫更新內置SQLite版本,如APSW

+0

這根本不是Python問題,它缺乏適當的索引。創建上述索引後,Python在大約2.5秒內咀嚼所有33,000條記錄,並且SQlite Studio中的執行時間從1.75秒左右徘徊到0.0006秒。 – IAmKale