我使用大型隨機數字作爲關鍵字(從另一個系統進入)。插入和更新相當小的(如幾百萬行)表所花費的時間比我認爲合理的要長得多。爲什麼MySQL InnoDB插入如此緩慢?
我已經蒸餾非常簡單的測試來說明。在測試表中,我試圖儘可能簡化它;我真正的代碼沒有這樣簡單的佈局,並且有關係和額外的索引等等。但是,更簡單的設置會顯示相同的性能。
下面是結果:
creating the MyISAM table took 0.000 seconds
creating 1024000 rows of test data took 1.243 seconds
inserting the test data took 6.335 seconds
selecting 1023742 rows of test data took 1.435 seconds
fetching 1023742 batches of test data took 0.037 seconds
dropping the table took 0.089 seconds
creating the InnoDB table took 0.276 seconds
creating 1024000 rows of test data took 1.165 seconds
inserting the test data took 3433.268 seconds
selecting 1023748 rows of test data took 4.220 seconds
fetching 1023748 batches of test data took 0.037 seconds
dropping the table took 0.288 seconds
插入1M行插入的MyISAM花費6秒;進入InnoDB需要3433秒!
我在做什麼錯?什麼是錯誤配置? (MySQL是一個正常的Ubuntu安裝使用默認值)
下面是測試代碼:
import sys, time, random
import MySQLdb as db
# usage: python script db_username db_password database_name
db = db.connect(host="127.0.0.1",port=3306,user=sys.argv[1],passwd=sys.argv[2],db=sys.argv[3]).cursor()
def test(engine):
start = time.time() # fine for this purpose
db.execute("""
CREATE TEMPORARY TABLE Testing123 (
k INTEGER PRIMARY KEY NOT NULL,
v VARCHAR(255) NOT NULL
) ENGINE=%s;"""%engine)
duration = time.time()-start
print "creating the %s table took %0.3f seconds"%(engine,duration)
start = time.time()
# 1 million rows in 100 chunks of 10K
data = [[(str(random.getrandbits(48)) if a&1 else int(random.getrandbits(31))) for a in xrange(10*1024*2)] for b in xrange(100)]
duration = time.time()-start
print "creating %d rows of test data took %0.3f seconds"%(sum(len(rows)/2 for rows in data),duration)
sql = "REPLACE INTO Testing123 (k,v) VALUES %s;"%("(%s,%s),"*(10*1024))[:-1]
start = time.time()
for rows in data:
db.execute(sql,rows)
duration = time.time()-start
print "inserting the test data took %0.3f seconds"%duration
# execute the query
start = time.time()
query = db.execute("SELECT k,v FROM Testing123;")
duration = time.time()-start
print "selecting %d rows of test data took %0.3f seconds"%(query,duration)
# get the rows in chunks of 10K
rows = 0
start = time.time()
while query:
batch = min(query,10*1024)
query -= batch
rows += len(db.fetchmany(batch))
duration = time.time()-start
print "fetching %d batches of test data took %0.3f seconds"%(rows,duration)
# drop the table
start = time.time()
db.execute("DROP TABLE Testing123;")
duration = time.time()-start
print "dropping the table took %0.3f seconds"%duration
test("MyISAM")
test("InnoDB")
>有興趣看到你的基準! MyISAM:使用自動遞增鍵創建一個表,然後向隨機鍵字段添加一個索引,這與使用之前索引的隨機字段創建表大致一樣快;全部在8秒內。 InnoDB:使用自動增量主鍵插入需要54秒。然後在該隨機字段上創建索引需要214秒。慢,但*大量*比使用隨機密鑰插入更快。 – Will 2012-04-05 13:21:49
Paul,關於順序鍵的性能和優點的一般問題:只要鍵仍然存在,按鍵中是否存在間隙,是否重要?即:1,5,10,500,1234,7800等。我已閱讀了許多有關按鍵順序好處的資料,但我不確定「順序」是否意味着按升序排列(可能存在空白),或者如果順序意味着沒有差距。好奇,因爲這與我正在使用的多服務器密鑰生成系統有關,我在StackOverflow問題#6338956中討論了這個問題。謝謝。 – YeB 2013-02-01 01:50:32
隨機密鑰插入如此之慢的原因是InnoDB以主鍵順序存儲行,而不是擁有巨大的具有單獨主鍵索引的行數據隨機池。這意味着如果您插入(僅)id = 1的記錄和另一個id = 10的記錄,兩行的數據將並排存儲。如果你插入一個id = 5的記錄,InnoDB必須移動id = 10的數據,以便將整個id = 5記錄放入表中。做很多次,你會發現很多數據被移動了很多次。對於隨機密鑰,你無能爲力。 – 2013-04-06 16:52:06