我對Python和NLTK相當新穎。我忙於一個可以執行拼寫檢查的應用程序（用正確拼寫的單詞替換拼寫錯誤的單詞），我目前使用Python-2.7上的Enchant Library，PyEnchant和NLTK庫。下面的代碼是處理更正/替換的類。Python的拼寫檢查器

from nltk.metrics import edit_distance 

class SpellingReplacer(object): 
    def __init__(self, dict_name = 'en_GB', max_dist = 2): 
     self.spell_dict = enchant.Dict(dict_name) 
     self.max_dist = 2 

    def replace(self, word): 
     if self.spell_dict.check(word): 
      return word 
     suggestions = self.spell_dict.suggest(word) 

     if suggestions and edit_distance(word, suggestions[0]) <= self.max_dist: 
      return suggestions[0] 
     else: 
      return word

我寫了一個函數，它在單詞的列表，並進行高清替換每個單詞和返回的單詞的列表，但拼寫正確。

def spell_check(word_list): 
    checked_list = [] 
    for item in word_list: 
     replacer = SpellingReplacer() 
     r = replacer.replace(item) 
     checked_list.append(r) 
    return checked_list 

>>> word_list = ['car', 'colour'] 
>>> spell_check(words) 
['car', 'color']

現在我真的不喜歡這一點，因爲它不是很準確，我正在尋找一種方式來實現對單詞的拼寫檢查和更換。我還需要一些可以解決「caaaar」這樣的拼寫錯誤的東西？有更好的方法來執行拼寫檢查嗎？如果是的話，他們是什麼？ Google如何做這件事，因爲他們的拼寫建議者非常好？任何建議

來源

2012-12-18 Mike Barnes

我建議首先仔細閱讀this post by Peter Norvig。（我得到了類似的東西，我發現它非常有用。）

以下函數特別具有您現在需要使拼寫檢查器更復雜的想法：分割，刪除，轉置和插入不規則詞以「糾正」它們。

def edits1(word): 
    splits  = [(word[:i], word[i:]) for i in range(len(word) + 1)] 
    deletes = [a + b[1:] for a, b in splits if b] 
    transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1] 
    replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b] 
    inserts = [a + c + b  for a, b in splits for c in alphabet] 
    return set(deletes + transposes + replaces + inserts)

注：以上是從弱勢族羣的拼寫校正

好消息一個片段是，你可以逐步增加，不斷提高你的拼寫檢查。

希望有所幫助。

來源

2012-12-18 17:13:45

拼寫修正器>

您需要在導入語料庫到你的桌面，如果你存儲在其他位置更改代碼的路徑我已經加了幾個圖形，以及使用Tkinter的，這是唯一的解決非字錯誤！

def min_edit_dist(word1,word2): 
    len_1=len(word1) 
    len_2=len(word2) 
    x = [[0]*(len_2+1) for _ in range(len_1+1)]#the matrix whose last element ->edit distance 
    for i in range(0,len_1+1): 
     #initialization of base case values 
     x[i][0]=i 
     for j in range(0,len_2+1): 
      x[0][j]=j 
    for i in range (1,len_1+1): 
     for j in range(1,len_2+1): 
      if word1[i-1]==word2[j-1]: 
       x[i][j] = x[i-1][j-1] 
      else : 
       x[i][j]= min(x[i][j-1],x[i-1][j],x[i-1][j-1])+1 
    return x[i][j] 
from Tkinter import * 


def retrieve_text(): 
    global word1 
    word1=(app_entry.get()) 
    path="C:\Documents and Settings\Owner\Desktop\Dictionary.txt" 
    ffile=open(path,'r') 
    lines=ffile.readlines() 
    distance_list=[] 
    print "Suggestions coming right up count till 10" 
    for i in range(0,58109): 
     dist=min_edit_dist(word1,lines[i]) 
     distance_list.append(dist) 
    for j in range(0,58109): 
     if distance_list[j]<=2: 
      print lines[j] 
      print" " 
    ffile.close() 
if __name__ == "__main__": 
    app_win = Tk() 
    app_win.title("spell") 
    app_label = Label(app_win, text="Enter the incorrect word") 
    app_label.pack() 
    app_entry = Entry(app_win) 
    app_entry.pack() 
    app_button = Button(app_win, text="Get Suggestions", command=retrieve_text) 
    app_button.pack() 
    # Initialize GUI loop 
    app_win.mainloop()

來源

2013-11-20 09:46:55

可以使用autocorrect LIB拼寫檢查蟒蛇。
實例應用：

from autocorrect import spell 

print spell('caaaar') 
print spell(u'mussage') 
print spell(u'survice') 
print spell(u'hte')

結果：

caesar 
message 
service 
the

來源

2018-01-16 11:48:34 Rakesh

Python的拼寫檢查器

回答

拼寫修正器>

相關問題