我正在嘗試用於自然語言處理的Python庫NLTK。執行詞幹輸出亂碼/級聯詞
我的問題:我試圖執行詞幹;將單詞減少到規範化的形式。但它沒有產生正確的詞彙。我是否正確使用詞幹類?我怎樣才能得到我想要得到的結果?
我想正常化下面的話:
words = ["forgot","forgotten","there's","myself","remuneration"]
...這個:
words = ["forgot","forgot","there","myself","remunerate"]
我的代碼:
from nltk import stem
words = ["forgot","forgotten","there's","myself","remuneration"]
for word in words:
print stemmer.stem(word)
#output is:
#forgot forgotten there' myself remuner