2015-05-11 52 views
2

下面是我的Python代碼優化的Python雙人間循環

myList = ['A','B','C','D','E',...] #all elements is string 
myDict = [D1,D2] # let consider 2 dict 
# D1 = {'A':0.1,'B':0.5,'C':0.01,...}  
# D2 = {'A':0.4,'B':0.11,'C':0.21,...} 

myNewDict = {} 
for words in myList: 
    NewList = [] 
    for dicts in myDict: 
      tmps = dicts[words] 
      NewList.append(tmps) 
    myNewDict[words] = (min(NewList), max(NewList)) 
我使用Python 3.4 64位,所以我想通過使用python內置對如何提高代碼的性能提出了一些建議

在功能或任何更好的方式來提高計算時間。在此衷心感謝您的建議和想法。由於

+3

它會更容易給你的建議,如果你能發佈與'myList'和'myDict'填入實際值的完整工作示例。在原始示例中,他們不必分別有100和60個條目,只有少數條目可以。這樣我們就可以看到實際數據的樣子。 –

+3

...雖然知道實際的_sizes_將會很好,因爲很多優化都適用,比如說,一次使用的一小列巨大字典對於比如說小字典的小列表來說是沒有意義的使用數十億次。 – abarnert

+0

尊敬的Michaael和Abarnet,謝謝。我已經改進了這個例子,謝謝。 –

回答

0

看起來像一個整潔的練習,我終於可以用tee,所以我做了這個:

from itertools import tee 
words = ['A','B','C'] 
dicts = [{'A': 0.1, 'B': 0.5, 'C': 0.01}, 
     {'A': 0.4, 'B': 0.11, 'C': 0.21}] 
newdict = {word: (min(minfeed), max(maxfeed)) 
      for word in words 
      for minfeed, maxfeed in [tee(d[word] for d in dicts)]} 

輸出:

{'B': (0.11, 0.5), 'C': (0.01, 0.21), 'A': (0.1, 0.4)} 

編輯:我很好奇,並嘗試了一些更多的版本,做了大量隨機數據,10萬字和100個詞典的高速測試。結果第一:

5.418 seconds Chin 
4.364 seconds Stefan 
3.460 seconds Stefan2 
3.471 seconds Stefan3 

代碼:

from itertools import tee 
def Stefan(words, dicts): 
    return {word: (min(minfeed), max(maxfeed)) 
      for word in words 
      for minfeed, maxfeed in [tee(d[word] for d in dicts)]} 

def Stefan2(words, dicts): 
    out = {} 
    for word in words: 
     values = [d[word] for d in dicts] 
     out[word] = min(values), max(values) 
    return out 

def Stefan3(words, dicts): 
    return {word: (min(values), max(values)) 
      for word in words 
      for values in [[d[word] for d in dicts]]} 

def Chin(myList, myDict): 
    myNewDict = {} 
    for words in myList: 
     NewList = [] 
     for dicts in myDict: 
       tmps = dicts[words] 
       NewList.append(tmps) 
     myNewDict[words] = (min(NewList), max(NewList)) 
    return myNewDict 

from random import sample, randrange, random 
from string import ascii_letters 
from time import time 

WORDS = 100000 
DICTS = 100 

def random_word(): 
    ''.join(sample(ascii_letters, randrange(2, 10))) 
words = [random_word() for _ in range(WORDS)] 
dicts = [{w: random() for w in words} for _ in range(DICTS)] 

prev = None 
for func in Chin, Stefan, Stefan2, Stefan3: 
    t0 = time() 
    result = func(words, dicts) 
    print('%6.3f seconds' % (time() - t0), func.__name__) 
    if prev and result != prev: 
     print('fail') 
    prev = result 
+0

嗨,Stefan Pochmann,謝謝你的建議和建議,絕對會試試這個建議。謝謝。 –

+0

@ChinLim我很好奇自己做了更多,看到更新的答案。對於我嘗試過的數據,它比你的數據快一點,但這取決於數據。如果我們更瞭解您的數據,可能會有更好的解決方案(甚至可能採用真正不同的方法,而不僅僅是「語法變體」)。 –