Python的AttributeError的：「元組」對象有沒有屬性「低」

-2

我試圖做一個乾淨的文檔操作刪除停用詞，詞性標註及以下詞幹是我的代碼Python的AttributeError的：「元組」對象有沒有屬性「低」

def cleanDoc(doc): 
    stopset = set(stopwords.words('english')) 
    stemmer = nltk.PorterStemmer() 
    #Remove punctuation,convert lower case and split into seperate words 
    tokens = re.findall(r"<a.*?/a>|<[^\>]*>|[\w'@#]+", doc.lower() ,flags = re.UNICODE | re.LOCALE) 
    #Remove stopwords and words < 2 
    clean = [token for token in tokens if token not in stopset and len(token) > 2] 
    #POS Tagging 
    pos = nltk.pos_tag(clean) 
    #Stemming 
    final = [stemmer.stem(word) for word in pos] 
    return final

我得到這個錯誤：

Traceback (most recent call last): 
    File "C:\Users\USer\Desktop\tutorial\main.py", line 38, in <module> 
    final = cleanDoc(doc) 
    File "C:\Users\USer\Desktop\tutorial\main.py", line 30, in cleanDoc 
    final = [stemmer.stem(word) for word in pos] 
    File "C:\Python27\lib\site-packages\nltk\stem\porter.py", line 556, in stem 
    stem = self.stem_word(word.lower(), 0, len(word) - 1) 
AttributeError: 'tuple' object has no attribute 'lower'

來源

2013-04-17 user245398

你試過任何調試，以找出爲什麼''字''是'元組'而不是字符串？或者查找'ntlk.pos_tag（）'的文檔來查看它返回的內容而不是一串字符串？ – millimoose

在這一行：

pos = nltk.pos_tag(clean)

nltk.pos_tag()返回元組(word, tag)，不是字符串列表。用這個來得到這樣的字：

pos = nltk.pos_tag(clean) 
final = [stemmer.stem(tagged_word[0]) for tagged_word in pos]

來源

2013-04-17 13:32:08 RichieHindle

nltk.pos_tag返回元組列表，而不是字符串列表。也許你想

final = [stemmer.stem(word) for word, _ in pos]

來源

2013-04-17 13:31:12

Python的AttributeError的：「元組」對象有沒有屬性「低」

回答

相關問題