2014-11-02 59 views
0

我對某些Python代碼有點麻煩。我有一個名爲「big.txt」的大文本文件。我在我的代碼中迭代了它,將每個單詞排序到一個數組(或列表)中,然後再次遍歷它以刪除任何不在字母表中的字符。我也有一個叫worddistance的函數,看看兩個單詞有多相似,然後返回一個分數。我有另一個功能叫autocorrect。我想通過這個函數拼寫錯誤的單詞,並打印'Did you mean...'句子,在worddistance函數上得分較低的單詞(只要注意到差異,函數就會將計數值加1) - 分數越低,則越相似)。
奇怪的是,我不斷收到錯誤:比較字符串時出現索引錯誤 - Python

"Index Error: string index out of range"

我處發生了什麼損失!

我的代碼如下。

在此先感謝您的答覆,
塞繆爾·諾頓

f = open("big.txt", "r") 

words = list() 

temp_words = list() 
for line in f: 
    for word in line.split(): 
     temp_words.append(word.lower()) 

allowed_characters = 'abcdefghijklmnopqrstuvwxyz'  
for item in temp_words: 
    temp_new_word = '' 
    for char in item: 
     if char in allowed_characters: 
      temp_new_word += char 
     else: 
      continue 
    words.append(temp_new_word) 
list(set(words)).sort() 

def worddistance(word1, word2): 
    counter = 0 
    if len(word1) > len(word2): 
     counter += len(word1) - len(word2) 
     new_word1 = word1[:len(word2) + 1] 
     for char in range(0, len(word2) + 1) : 
      if word2[char] != new_word1[char]: 
       counter += 1 
      else: 
       continue 
    elif len(word2) > len(word1): 
     counter += len(word2) - len(word1) 
     new_word2 = word2[:len(word1) + 1] 
     for char in range(0, len(word1) + 1): 
      if word1[char] != word2[char]: 
       counter += 1 
      else: 
       continue 
    return counter 

def autocorrect(word): 
    word.lower() 
    if word in words: 
     print("The spelling is correct.") 
     return 
    else: 
     suggestions = list() 
     for item in words: 
      diff = worddistance(word, item) 
      if diff == 1: 
       suggestions.append(item) 
     print("Did you mean: ", end = ' ') 

    if len(suggestions) == 1: 
       print(suggestions[0]) 
       return 

    else: 
     for i in range(0, len(suggestions)): 
      if i == len(suggestons) - 1: 
       print("or " + suggestions[i] + "?") 
       return 
      print(suggestions[i] + ", ", end="") 
      return 
+0

在哪一行你得到這個錯誤 – user3378649 2014-11-02 20:40:32

回答

0

worddistance(),它看起來像for char in range(0, len(word1) + 1):應該是:

for char in range(len(word1)): 

而且for char in range(0, len(word2) + 1) :應該是:

for char in range(len(word2)): 

順便說一句,list(set(words)).sort()正在排序一個臨時列表,這可能不是你想要的。它應該是:

words = sorted(set(words)) 
0

正如在其他評論中提到的,你應該range(len(word1))

除此之外: - 您應該考慮word1和words具有相同長度的情況#len(word2) == len(word1) - 您還應該注意命名。在wordDistance函數的第二個條件

if word1[char] != word2[char]: 

你應該比較new_word2

if word1[char] != new_word2[char]: 

- 自動更正,您應該分配低級到word= word.lower()

words= [] 
for item in temp_words: 
    temp_new_word = '' 
    for char in item: 
     if char in allowed_characters: 
      temp_new_word += char 
     else: 
      continue 
    words.append(temp_new_word) 
words= sorted(set(words)) 

def worddistance(word1, word2): 
    counter = 0 
    if len(word1) > len(word2): 
     counter += len(word1) - len(word2) 
     new_word1 = word1[:len(word2) + 1] 
     for char in range(len(word2)) : 
      if word2[char] != new_word1[char]: 
       counter += 1 
    elif len(word2) > len(word1): 
     counter += len(word2) - len(word1) 
     new_word2 = word2[:len(word1) + 1] 
     for char in range(len(word1)): 
      if word1[char] != new_word2[char]: #This is a problem 
       counter += 1 
    else: #len(word2) == len(word1)  #You missed this case 
     for char in range(len(word1)): 
      if word1[char] != word2[char]: 
       counter += 1 
    return counter 

def autocorrect(word): 
    word= word.lower() #This is a problem 
    if word in words: 
     print("The spelling is correct.") 
    else: 
     suggestions = list() 
     for item in words: 
      diff = worddistance(word, item) 
      print diff 
      if diff == 1: 
       suggestions.append(item) 
     print("Did you mean: ") 

     if len(suggestions) == 1: 
      print(suggestions[0]) 

     else: 
      for i in range(len(suggestions)): 
       if i == len(suggestons) - 1: 
        print("or " + suggestions[i] + "?") 
       print(suggestions[i] + ", ") 

下一次,嘗試使用Python內置函數如enumerate,以避免使用i in range(list),然後list[i],len instea d的計數器..等

例如: 你的距離函數可以這樣寫,或更簡單。

def distance(word1, word2): 
    counter= max(len(word1),len(word2))- min(len(word1),len(word2)) 
    if len(word1) > len(word2): 
     counter+= len([x for x,z in zip (list(word2), list(word1[:len(word2) + 1])) if x!=z]) 
    elif len(word2) > len(word1): 
     counter+= len([x for x,z in zip (list(word1), list(word2[:len(word1) + 1])) if x!=z]) 
    else: 
     counter+= len([x for x,z in zip (list(word1), list(word2)) if x!=z]) 
    return counter