2012-07-12 77 views
2

我試圖將在this Stack Overflow answer中發現的Viterbi算法的Python實現轉換爲Ruby。完整的腳本可以在這個問題的底部看到我的評論。需要幫助瞭解這個Python維特比算法

不幸的是我對Python知之甚少,所以翻譯比我想要的更困難。不過,我取得了一些進展。現在,唯一一條將我的大腦完全融化的線是:

prob_k, k = max((probs[j] * word_prob(text[j:i]), j) for j in range(max(0, i - max_word_length), i)) 

有人能解釋一下它在做什麼嗎?

以下是完整的Python腳本:

import re 
from itertools import groupby 

# text will be a compound word such as 'wickedweather'. 
def viterbi_segment(text): 
    probs, lasts = [1.0], [0] 

    # Iterate over the letters in the compound. 
    # eg. [w, ickedweather], [wi, ckedweather], and so on. 
    for i in range(1, len(text) + 1): 
    # I've no idea what this line is doing and I can't figure out how to split it up? 
    prob_k, k = max((probs[j] * word_prob(text[j:i]), j) for j in range(max(0, i - max_word_length), i)) 
    # Append values to arrays. 
    probs.append(prob_k) 
    lasts.append(k) 

    words = [] 
    i = len(text) 
    while 0 < i: 
    words.append(text[lasts[i]:i]) 
    i = lasts[i] 
    words.reverse() 
    return words, probs[-1] 

# Calc the probability of a word based on occurrences in the dictionary. 
def word_prob(word): 
    # dictionary.get(key) will return the value for the specified key. 
    # In this case, thats the number of occurances of thw word in the 
    # dictionary. The second argument is a default value to return if 
    # the word is not found. 
    return dictionary.get(word, 0)/total 

# This ensures we ony deal with full words rather than each 
# individual letter. Normalize the words basically. 
def words(text): 
    return re.findall('[a-z]+', text.lower()) 

# This gets us a hash where the keys are words and the values are the 
# number of ocurrances in the dictionary. 
dictionary = dict((w, len(list(ws))) 
    # /usr/share/dixt/words is a file of newline delimitated words. 
    for w, ws in groupby(sorted(words(open('/usr/share/dict/words').read())))) 

# Assign the length of the longest word in the dictionary. 
max_word_length = max(map(len, dictionary)) 

# Assign the total number of words in the dictionary. It's a float 
# because we're going to divide by it later on. 
total = float(sum(dictionary.values())) 

# Run the algo over a file of newline delimited compound words. 
compounds = words(open('compounds.txt').read()) 
for comp in compounds: 
    print comp, ": ", viterbi_segment(comp) 

回答

1

你正在尋找一個list comprehension

的擴展版本看起來是這樣的:

all_probs = [] 

for j in range(max(0, i - max_word_length), i): 
    all_probs.append((probs[j] * word_prob(text[j:i]), j)) 

prob_k, k = max(all_probs) 

我希望幫助解釋它。如果沒有,請隨時編輯您的問題,並指出您不明白的陳述。