我需要將很多實體保存到數據庫中。保存一個實體包括將行添加到不同的表中,並通過在一個表中插入一行用於將某行插入到另一個表中來自動生成鍵。這樣的邏輯使我創建和使用存儲過程。分別爲每個實體調用這個存儲過程(即通過statement.execute(...))可以正常工作,除非有數十億個實體要保存。所以我試圖分批做到這一點。但是,如果是批處理,則批處理執行會導致拋出org.postgresql.util.PSQLException,並顯示一條消息'如果沒有預期結果,則返回結果。'在PostgreSQL中批量存儲過程
我的存儲過程是這樣的:
CREATE OR REPLACE FUNCTION insertSentence(warcinfoID varchar, recordID varchar, sentence varchar,
sent_timestamp bigint, sect_ids smallint[]) RETURNS void AS $$
DECLARE
warcinfoIdId integer := 0;
recordIdId integer := 0;
sentId integer := 0;
id integer := 0;
BEGIN
SELECT warcinfo_id_id INTO warcinfoIdId FROM warcinfo_id WHERE warcinfo_id_value = warcinfoID;
IF NOT FOUND THEN
INSERT INTO warcinfo_id (warcinfo_id_value) VALUES (warcinfoID)
RETURNING warcinfo_id_id INTO STRICT warcinfoIdId;
END IF;
SELECT record_id_id INTO recordIdId FROM record_id WHERE record_id_value = recordID;
IF NOT FOUND THEN
INSERT INTO record_id (record_id_value) VALUES (recordID)
RETURNING record_id_id INTO STRICT recordIdId;
END IF;
LOOP
SELECT sent_id INTO sentId FROM sentence_text
WHERE md5(sent_text) = md5(sentence) AND sent_text = sentence;
EXIT WHEN FOUND;
BEGIN
INSERT INTO sentence_text (sent_text) VALUES (sentence) RETURNING sent_id INTO STRICT sentId;
EXCEPTION WHEN unique_violation THEN
sentId := 0;
END;
END LOOP;
INSERT INTO sentence_occurrence (warcinfo_id, record_id, sent_id, timestamp, sect_ids)
VALUES (warcinfoIdId, recordIdId, sentId, TO_TIMESTAMP(sent_timestamp), sect_ids)
RETURNING entry_id INTO STRICT id;
END;
$$ LANGUAGE plpgsql;
和Scala代碼是這樣的:
def partition2DB(iterator: Iterator[(String, String, String, Long, Array[Int])]): Unit = {
Class.forName(driver)
val conn = DriverManager.getConnection(connectionString)
try {
val statement = conn.createStatement()
var i = 0
iterator.foreach(r => {
i += 1
statement.addBatch(
"select insertSentence('%s', '%s', '%s', %d, '{%s}');".format(
r._1, r._2, r._3.replaceAll("'", "''"), r._4, r._5.mkString(","))
)
if (i % 1000 == 0) statement.executeBatch()
})
if (i % 1000 != 0) statement.executeBatch()
} catch {
case e: SQLException => println("exception caught: " + e.getNextException());
} finally {
conn.close
}
}
奇怪的是,即使statement.executeBatch()拋出一個異常,它在此之前保存的實體。所以這種解決方法,使事情的工作:
def partition2DB(iterator: Iterator[(String, String, String, Long, Array[Int])]): Unit = {
Class.forName(driver)
val conn = DriverManager.getConnection(connectionString)
try {
var statement = conn.createStatement()
var i = 0
iterator.foreach(r => {
i += 1
statement.addBatch(
"select insertSentence('%s', '%s', '%s', %d, '{%s}');".format(
r._1, r._2, r._3.replaceAll("'", "''"), r._4, r._5.mkString(","))
)
if (i % 1000 == 0) {
i = 0
try {
statement.executeBatch()
} catch {
case e: SQLException => statement = conn.createStatement()
}
}
})
if (i % 1000 != 0) {
try {
statement.executeBatch()
} catch {
case e: SQLException => statement = conn.createStatement()
}
}
} catch {
case e: SQLException => println("exception caught: " + e.getNextException());
} finally {
conn.close
}
}
不過,我希望不要輕信的PostgreSQL無證功能我目前使用。 我看到其他人也碰到這個問題來了:
- https://www.postgresql.org/message-id/[email protected]
- http://grokbase.com/t/postgresql/pgsql-jdbc/113g9ygydb/problem-with-executebatch-and-a-result-was-returned-when-none-was-expected
有人能提出一個解決辦法?
幹得好。如果插入操作的是多組輸入,而不是逐個調用,那麼您將獲得更大的改進,但它應該已經是一種改進。理想情況下,您可以使用PgJDBC的CopyManager加載臨時表,然後處理臨時表。 –