2013-03-24 15 views
1

在試圖優化模仿樹結構的程序的速度(「樹」存儲在DICT中,笛卡爾座標x,y座標對作爲關鍵字)時,我發現將它們的唯一地址作爲元組存儲在字典中而不是Strings,導致運行時間大大加快。爲什麼python字典中的字符串鍵比元組寫入/讀取要慢?

我的問題是,如果Python針對字典和哈希中的字符串鍵進行了優化,爲什麼在這個示例中使用元組更快?在完成完全相同的任務時,字符串鍵似乎花費了60%的時間。我在例子中忽略了一些簡單的東西嗎?

我引用這個線程爲基礎,爲我的問題(以及其他人做出同樣的斷言字符串是更快):Is it always faster to use string as key in a dict?

下面是我用測試方法的代碼和時間他們:

import time 

def writeTuples(): 
    k = {} 
    for x in range(0,500): 
     for y in range(0,x): 
      k[(x,y)] = "%s,%s"%(x,y) 
    return k 

def readTuples(k): 
    failures = 0 
    for x in range(0,500): 
     for y in range(0,x): 
      if k.get((x,y)) is not None: pass 
      else: failures += 1 
    return failures 

def writeStrings(): 
    k = {} 
    for x in range(0,500): 
     for y in range(0,x): 
      k["%s,%s"%(x,y)] = "%s,%s"%(x,y) 
    return k 

def readStrings(k): 
    failures = 0 
    for x in range(0,500): 
     for y in range(0,x): 
      if k.get("%s,%s"%(x,y)) is not None: pass 
      else: failures += 1 
    return failures 

def calcTuples(): 
    clockTimesWrite = [] 
    clockTimesRead = [] 
    failCounter = 0 
    trials = 100 

    st = time.clock() 
    for x in range(0,trials): 
     startLoop = time.clock() 
     k = writeTuples() 
     writeTime = time.clock() 
     failCounter += readTuples(k) 
     readTime = time.clock() 
     clockTimesWrite.append(writeTime-startLoop) 
     clockTimesRead.append(readTime-writeTime) 

    et = time.clock() 

    print("The average time to loop with tuple keys is %f, and had %i total failed records"%((et-st)/trials,failCounter)) 
    print("The average write time is %f, and average read time is %f"%(sum(clockTimesWrite)/trials,sum(clockTimesRead)/trials)) 
    return None 

def calcStrings(): 
    clockTimesWrite = [] 
    clockTimesRead = [] 
    failCounter = 0 
    trials = 100 

    st = time.clock() 
    for x in range(0,trials): 
     startLoop = time.clock() 
     k = writeStrings() 
     writeTime = time.clock() 
     failCounter += readStrings(k) 
     readTime = time.clock() 
     clockTimesWrite.append(writeTime-startLoop) 
     clockTimesRead.append(readTime-writeTime) 

    et = time.clock() 
    print("The average time to loop with string keys is %f, and had %i total failed records"%((et-st)/trials,failCounter)) 
    print("The average write time is %f, and average read time is %f"%(sum(clockTimesWrite)/trials,sum(clockTimesRead)/trials)) 
    return None 

calcTuples() 
calcStrings() 

謝謝!

+0

「readTuples」和「readStrings」的縮進是否正確? – 2013-03-24 18:20:17

+0

嘗試使用'dis'和''profile'或'timeit'檢查每個函數。添加從這些測試中獲得的更多信息,我們將更容易爲您提供幫助。 :) – 2013-03-24 18:33:18

回答

4

測試的權重不夠(因此時間差異)。您在writeStrings循環中撥打format的電話的次數是您的writeTuples循環中的兩倍,並且您在readStrings中撥打的號碼無限多。要成爲一個更公平的測試,你將需要確保:

  • 兩個寫回路只打一個電話,到%每內環
  • readStringsreadTuples既能使1或0調用%每內環。
+0

謝謝肖恩。我做了你推薦的改變,當我在READ函數中調用相同數量的調用,並且在兩個內部寫入循環中調用相同的數字時 - 我發現字符串版本的速度提高了大約30%。我沒想到格式化功能會在它上面添加很多時間。 – Thomas 2013-03-24 18:46:19

0

我會說速度的差異是由於訪問者密鑰的字符串格式。

在writeTuples你有這樣一行:

 k[(x,y)] = ... 

這將創建一個新的記錄,並給它分配的值(X,Y),傳遞給k的訪問之前。

在writeStrings你有這樣一行:

 k["%s,%s"%(x,y)] = ... 

這確實都是一樣的計算中writeTuples而且還具有解析字符串「%S%S」的開銷(這可能在編譯完成時間,我不確定),但它也必須從數字中創建一個新的字符串(例如「12,15」)。我相信這會增加運行時間。

+0

謝謝克里斯。是的,真實程序中的縮進是正確的。把它放進去我把它輸入錯了。抱歉。 – Thomas 2013-03-24 18:47:29

+0

容易做到。我調整了我的答案,以更好地反映「固定」問題。 – 2013-03-27 13:05:39

0

正如其他人所說,字符串格式是問題。

這裏有一個快速版本,預計算所有的弦......

我的機器上,將字符串寫入比寫快的元組約27%。寫/讀速度快大約22%。

我剛剛重新格式化爲&簡化了您的工作。如果邏輯有點不同,你可以計算讀取與寫入的差異。

import timeit 

samples = [] 
for x in range(0,360): 
    for y in range(0,x): 
     i = (x,y) 
     samples.append((i, "%s,%s"%i)) 


def write_tuples(): 
    k = {} 
    for pair in samples: 
     k[pair[0]] = True 
    return k 

def write_strings(): 
    k = {} 
    for pair in samples: 
     k[pair[1]] = True 
    return k 


def read_tuples(k): 
    failures = 0 
    for pair in samples: 
     if k.get(pair[0]) is not None: pass 
     else: failures += 1 
    return failures 

def read_strings(k): 
    failures = 0 
    for pair in samples: 
     if k.get(pair[1]) is not None: pass 
     else: failures += 1 
    return failures 


stmt_t1 = """k = write_tuples()""" 
stmt_t2 = """k = write_strings()""" 
stmt_t3 = """k = write_tuples() 
read_tuples(k)""" 
stmt_t4 = """k = write_strings() 
read_strings(k)""" 


t1 = timeit.Timer(stmt=stmt_t1, setup = "from __main__ import samples, read_strings, write_strings, read_tuples, write_tuples") 
t2 = timeit.Timer(stmt=stmt_t2, setup = "from __main__ import samples, read_strings, write_strings, read_tuples, write_tuples") 
t3 = timeit.Timer(stmt=stmt_t3, setup = "from __main__ import samples, read_strings, write_strings, read_tuples, write_tuples") 
t4 = timeit.Timer(stmt=stmt_t4, setup = "from __main__ import samples, read_strings, write_strings, read_tuples, write_tuples") 

print "write tuples  : %s" % t1.timeit(100) 
print "write strings  : %s" % t2.timeit(100) 
print "write/read tuples : %s" % t3.timeit(100) 
print "write/read strings : %s" % t4.timeit(100) 
0

我跑你的代碼的Core i5 1.8GHz的機器上,併產生以下結果

  • 0.0767520.085863元組字符串循環
  • 0.0494460.050731
  • 閱讀0.0272990.035125

所以元組似乎是勝利的,但是你在寫函數中做了兩次字符串轉換。更改writeStrings

def writeStrings(): 
    k = {} 
    for x in range(0,360): 
     for y in range(0,x): 
      s = "%s,%s"%(x,y) 
      k[s] = s 
    return k 
  • 0.1016890.092957元組字符串循環
  • 0.0649330.044578
  • 閱讀0.0367480.048371

注意到的第一件事是結果有相當多的變化,所以你可以想要將trials=100更改爲更大的東西,回想一下python的timeit將會默認爲10000。我做了trials=5000

  • 0.0819440.067829元組字符串循環
  • 0.0522640.032866
  • 閱讀0.0296730.034957

因此字符串的版本速度更快,但是作爲已經在其他帖子中指出,這不是字典查找,它是損害字符串轉換。

相關問題