我想lemmatize這段文字,它是唯一lemmatize我需要lemmatize動詞也Python NLTK:如何解析文本包括英語動詞?
>>> import nltk, re, string
>>> from nltk.stem import WordNetLemmatizer
>>> from urllib import urlopen
>>> url="https://raw.githubusercontent.com/evandrix/nltk_data/master/corpora/europarl_raw/english/ep-00-01-17.en"
>>> raw = urlopen(url).read()
>>> raw ="".join(l for l in raw if l not in string.punctuation)
>>> tokens=nltk.word_tokenize(raw)
>>> from nltk.stem import WordNetLemmatizer
>>> lemmatizer = WordNetLemmatizer()
>>> lem = [lemmatizer.lemmatize(t) for t in tokens]
>>> lem[:20]
['Resumption', 'of', 'the', 'session', 'I', 'declare', 'resumed', 'the', 'session', 'of', 'the', 'European', 'Parliament', 'adjourned', 'on', 'Friday', '17', 'December', '1999', 'and']
這裏動詞一樣恢復了它想成爲的簡歷,你能告訴我,我應該爲lemmatize的做名詞全文
請修復您的代碼標識! –
我不知道這是我第一次問這是怎麼回事 –
只需粘貼代碼,然後選擇代碼,然後只需點擊「{}」符號即可。 –