在「Think Python：如何像計算機科學家那樣思考」練習9.3中，是否有更好的算法

本書中的exercise 9.3要求讀者找到排除this file中最小字數的5個禁止字母的組合。在「Think Python：如何像計算機科學家那樣思考」練習9.3中，是否有更好的算法

下面是我對第一部分的解決方案，我認爲這是對他們

# if the word contain any letter in letters, return True, 
# otherwise return False 
def contain(word, letters): 
    for letter in letters: 
     if letter in word: 
      return True 
    return False 

# return the number of words contain any letter in letters 
def ncont(words, letters): 
    count = 0 
    for word in words: 
     if contain(word, letters): 
      count += 1 
return count

但對於上面的問題，我只能認爲蠻力算法，就是想盡一切沒問題有可能的組合，正好有26！/5！ = 65780種組合，下面是執行：

def get_lset(nlt, alphabet, cur_set): 
    global min_n, min_set 
    # when get enough letters 
    if nlt <= 0: 
     cur_n = ncont(words, ''.join(cur_set)) 
     if min_n == -1 or cur_n < min_n: 
      min_n = cur_n 
      min_set = cur_set.copy() 
     print(''.join(cur_set), cur_n, ' *->', min_n, ''.join(min_set)) 
    # otherwise find the result letters in a recursive way 
    else: 
     cur_set.append(None) 
     for i in range(len(alphabet)): 
      cur_set[-1] = alphabet[i] 
      get_lset(nlt-1, alphabet[i+1:], cur_set) 
     cur_set.pop()

，然後調用這樣上面的函數：

if __name__ == '__main__': 
    min_n = -1 
    min_set = [] 
    with open('words.txt', 'r') as fin: 
     words = [line.strip() for line in fin] 
    get_lset(5, list(string.ascii_lowercase), []) 
    print(min_set, min_n)

但這種方法是很慢的，我想知道的是有這個問題的一個更好的算法？任何建議都會很好！

來源

2014-03-19 zbtong

首先，讓我們更簡潔

def contain(word, letters): 
    return any(letter in word for letter in letters) 

def ncont(words, letters): 
    return sum(contain(word, letters) for word in words):

改寫目前的算法平均複雜

O(len(letters) * len(a_word) * len(words)) 
    ---+---------------------- -+-------- 
    contain(word, letters)  ncont(words, letters)

我們可以通過使用set■減少這樣的：

def contain(word, letters): 
    return not set(letters).isdisjoint(set(word))

哪簡化爲：

O(min(len(letters), len(a_word)) * len(words)) 
    ---+-------------------------- -+-------- 
    contain(word, letters)  ncont(words, letters)

根據https://wiki.python.org/moin/TimeComplexity

至於第二部分，算法會更容易理解與itertools：

import itertools 

def minimum_letter_set(words, n): 
    attempts = itertools.combinations(string.ascii_lowercase, n) 
    return min(attempts, key=lambda attempt: ncont(words, attempt))

然而，我們可以做的更好：

def minimum_letter_set(words, n): 
    # build a lookup table for each letter to the set of words it features in 
    by_letter = { 
     letter: { 
      word 
      for word in words 
      if letter in word 
     } 
     for letter in string.ascii_lowercase 
    } 

    # allowing us to define a function that finds words that match multiple letters 
    def matching_words(letters): 
     return set.union(*(by_letter[l] for l in letters)) 

    # find all 5 letter combinations 
    attempts = itertools.combinations(string.ascii_lowercase, n) 

    # and return the one that matches the fewest words 
    return min(attempts, key=lambda a: len(matching_words(a))))

我不相信這有更低的算法thmic複雜性，但它肯定節省了重複篩選單詞列表的工作。

來源

2014-03-19 12:53:18 Eric

非常感謝，這是一個很好的例子，它展示了python簡潔的強大功能，並且功能變得比原始實現更快。你有沒有對第二部分的建議，找到可以得到最少字數的5個字母的集合？我認爲蠻力算法還不夠好 – zbtong

@zbtong：看我的更新 – Eric

這太棒了，函數式編程風格似乎是讓程序更具可讀性的好方法，非常感謝。 – zbtong

這裏是我的想法：

首先計算排除[1]該地圖字母設置字母L排除字。

計算這26組中最小的五組的聯合。這給你一個公平的「臨時最低結果」。

然後，而不是使用itertools.combinations探索5個字母的所有組合，編寫自己的算法來做到這一點。計算這裏設置的「排除」的聯合。在這個算法中，如果對於第一個字母（i < 5），「排除」集合的聯合已經大於「臨時最小結果」，則根本不需要考慮下面的字母。當你發現比當前「臨時最小結果」更好的五個字母組合時，更新它。

來源

2014-03-19 14:58:50 MatthieuW

好主意！這正是我所問的，非常感謝。 – zbtong

我很高興你喜歡它。如果你實施它，請給出一些反饋。 – MatthieuW

好吧，我會盡快實現它！ – zbtong

在「Think Python：如何像計算機科學家那樣思考」練習9.3中，是否有更好的算法

回答

相關問題