2010-06-17 77 views
2

我有一個包含約100萬條目的臨時表。臨時表存儲較大查詢的結果。例如,我想一次處理這些記錄1000。設置查詢的最佳方式是什麼,以便獲得前1000行,接下來的1000行等等。它們不是固有的訂購,但臨時表只有一列帶有ID,所以我可以根據需要訂購。我想用臨時表創建一個額外的列號的所有行的,是這樣的:從臨時表中獲取n條記錄

CREATE TEMP TABLE tmptmp AS 
SELECT ##autonumber somehow##, id 
FROM .... --complicated query 

那麼我可以這樣做:

SELECT * FROM tmptmp WHERE autonumber>=0 AND autonumber < 1000 

等...我將如何真正做到這一點?或者,還有更好的方法?我使用Python和PostgreSQL。

+0

如果你使用'pygresql'或'psycopg2'您可以指定... – ChristopheD 2010-06-17 21:37:23

+0

有可能是一個更好的辦法如果您正在創建一個具有一百萬行的臨時表,但很難說不知道這些表和您嘗試實現的結果...... – 2010-06-17 21:38:48

+0

嘗試使用Postgres遊標並限制開始,大小。我可以詳細說明你是否願意。當然是 – 2010-06-17 21:40:10

回答

4

使用遊標並獲取所需的行。當你有很多記錄時,偏移...限制會變得很慢,光標會做得更好。

http://www.postgresql.org/docs/8.4/interactive/sql-fetch.html

+0

。從Python我只需要'cur.fetchmany(1000)'而不是'cur.fetchall()'heh。 – Claudiu 2010-06-17 21:53:38

+1

+1是的,那當然是更好的解決方案(很快會刪除我的)。在Python中符合dbapi2標準的數據庫接口中,確實使用std'.execute(sql)'後跟一系列'.fetchmany(1000)'直到光標被完全消耗。 – ChristopheD 2010-06-17 21:55:20

1

也許你可以使用這樣的事情(我們什麼時候批量更新表2000萬行使用,不想養豬複製)。

import sys 
import psycopg2 
from datetime import datetime 

firstid = 0 
splitsize = 50 # Size of each batch 


# Complicated query 
query_complex = """ 
    CREATE TEMP TABLE tmptmp AS 
    SELECT * FROM schema.massive_table 
""" 
# Query to be run at intervals 
query = """ 
    SELECT * FROM tmptmp WHERE id BETWEEN %(startid)s AND %(endid)s 
""" 

conn = psycopg2.connect("dbname=database_name user=postgres") 
curs = conn.cursor() 
# Run complicated query 
curs.execute(query_complex) 
# Get highest id 
curs.execute("SELECT max(id) FROM tmptmp") 
maxid = curs.fetchall()[0][0] 
print "Max id: %s" % maxid 

for startid in range(firstid, maxid, splitsize): 
    endid = startid + splitsize - 1 
    print "%s: Running query on range %s to %s" % (datetime.now(), startid, endid) 
    curs.execute(query, {'startid':startid, 'endid':endid}) 
    print "%s: Affected rows: %s. Total completed: %s%%" % (datetime.now(), curs.rowcount, round((endid * 100)/maxid, 3)) 

print "Done." 

以下輸出:

Max id: 308 
2010-06-18 11:59:11.271000: Running query on range 0 to 49 
2010-06-18 11:59:11.271000: Affected rows: 49. Total completed: 15.0% 
2010-06-18 11:59:11.271000: Running query on range 50 to 99 
2010-06-18 11:59:11.271000: Affected rows: 50. Total completed: 32.0% 
2010-06-18 11:59:11.271000: Running query on range 100 to 149 
2010-06-18 11:59:11.271000: Affected rows: 50. Total completed: 48.0% 
2010-06-18 11:59:11.271000: Running query on range 150 to 199 
2010-06-18 11:59:11.271000: Affected rows: 49. Total completed: 64.0% 
2010-06-18 11:59:11.271000: Running query on range 200 to 249 
2010-06-18 11:59:11.271000: Affected rows: 42. Total completed: 80.0% 
2010-06-18 11:59:11.271000: Running query on range 250 to 299 
2010-06-18 11:59:11.318000: Affected rows: 3. Total completed: 97.0% 
2010-06-18 11:59:11.318000: Running query on range 300 to 349 
2010-06-18 11:59:11.318000: Affected rows: 1. Total completed: 113.0% 
Done. 

//約翰