關於咆哮搜索的改進

我的問題是如果可以改進此代碼，以便我的定義的單詞列表中的單詞可以更快地搜索整個word_list.txt文件。我被告知，有一種方法可以通過將所有14個單詞放在適當的數據結構中遍歷一次文件來完成此操作。關於咆哮搜索的改進

word_list = ['serve','rival','lovely','caveat','devote',\ 
     'irving','livery','selves','latvian','saviour',\ 
     'observe','octavian','dovetail','Levantine'] 

def sorted_word(word): 
    """This return the sorted word""" 
    list_chars = list(word) 
    list_chars.sort() 
    word_sort = ''.join(list_chars) 
    return word_sort 

print("Please wait for a few moment...") 
print() 

#Create a empty dictionary to store our word and the anagrams 
dictionary = {} 
for words in word_list: 
    value = [] #Create an empty list for values for the key 
    individual_word_string = words.lower() 

    for word in open ('word_list.txt'): 
     word1 = word.strip().lower() #Use for comparing 

     #When sorted words are the same, update the dictionary   
     if sorted_word(individual_word_string) == sorted_word(word1): 
      if word1[0] == 'v': 
       value.append(word.strip()) #Print original word in word_list 
       tempDict = {individual_word_string:value} 
       dictionary.update(tempDict) 

#Print dictionary 
for key,value in dictionary.items(): 
    print("{:<10} = {:<}".format(key,value))

由於新的用戶限制，我無法發佈我的結果圖像。順便說一下，結果應該打印出以字母v開頭的anagrams。將很高興有任何幫助來改進此代碼。

來源

2012-10-19 Shadowill

可以將字排序僅僅作爲'字=「」。加入（排序（字））'，而不是一個功能 – DhruvPathak

你可能想通過交換周圍的兩個循環開始 - 外循環迭代文件中的單詞以及內部循環將其與您的單詞列表進行比較。 –

我看到這也會更有意義。謝謝 – Shadowill

如果您有足夠的內存，您可以嘗試將值存儲到字典中，然後對其執行散列搜索（相當快）。關於這一點的好處是你可以在將來再次使用它（字典創建過程很慢，查找速度很快）。如果你有令人難以置信的大數據集，你可能想使用map reduce，disco-project是我建議的一個很好的python/erlang框架。

word_list = ['serve','rival','lovely','caveat','devote',\ 
     'irving','livery','selves','latvian','saviour',\ 
     'observe','octavian','dovetail','Levantine'] 

print("Please wait for a few moment...") 
print() 

anagrams = {} 

for word in open ('word_list.txt'): 
    word = word.strip().lower() #Use for comparing 
    key = tuple(sorted(word)) 
    anagrams[key] = anagrams.get(key,[]) + [word] 

for word in word_list: 
    print "%s -> %s" % (word.lower(),aragrams[tuple(sorted(word.lower()))])

來源

2012-10-19 10:48:16 luke14free

關於咆哮搜索的改進

回答

相關問題