Python NLTK：如何解析文本包括英語動詞？

我想lemmatize這段文字，它是唯一lemmatize我需要lemmatize動詞也Python NLTK：如何解析文本包括英語動詞？

>>> import nltk, re, string 
    >>> from nltk.stem import WordNetLemmatizer 
    >>> from urllib import urlopen 
    >>> url="https://raw.githubusercontent.com/evandrix/nltk_data/master/corpora/europarl_raw/english/ep-00-01-17.en" 
    >>> raw = urlopen(url).read() 
    >>> raw ="".join(l for l in raw if l not in string.punctuation) 
    >>> tokens=nltk.word_tokenize(raw) 
    >>> from nltk.stem import WordNetLemmatizer 
    >>> lemmatizer = WordNetLemmatizer() 
    >>> lem = [lemmatizer.lemmatize(t) for t in tokens] 
    >>> lem[:20] 
['Resumption', 'of', 'the', 'session', 'I', 'declare', 'resumed', 'the', 'session', 'of', 'the', 'European', 'Parliament', 'adjourned', 'on', 'Friday', '17', 'December', '1999', 'and']

這裏動詞一樣恢復了它想成爲的簡歷，你能告訴我，我應該爲lemmatize的做名詞全文

來源

2014-06-15 M.A.Hassan

請修復您的代碼標識！ –

我不知道這是我第一次問這是怎麼回事 –

只需粘貼代碼，然後選擇代碼，然後只需點擊「{}」符號即可。 –

在wordnetlemmatizer使用pos參數：

>>> from nltk.stem import WordNetLemmatizer 
>>> from nltk import pos_tag 
>>> wnl = WordNetLemmatizer() 
>>> wnl.lemmatize('resumed') 
'resumed' 
>>> wnl.lemmatize('resumed', pos='v') 
u'resume'

下面是一個完整的代碼，用pos_tag功能：

>>> from nltk import word_tokenize, pos_tag 
>>> from nltk.stem import WordNetLemmatizer 
>>> wnl = WordNetLemmatizer() 
>>> txt = """Resumption of the session I declare resumed the session of the European Parliament adjourned on Friday 17 December 1999 , and I would like once again to wish you a happy new year in the hope that you enjoyed a pleasant festive period .""" 
>>> [wnl.lemmatize(i,j[0].lower()) if j[0].lower() in ['a','n','v'] else wnl.lemmatize(i) for i,j in pos_tag(word_tokenize(txt))] 
['Resumption', 'of', 'the', 'session', 'I', 'declare', u'resume', 'the', 'session', 'of', 'the', 'European', 'Parliament', u'adjourn', 'on', 'Friday', '17', 'December', '1999', ',', 'and', 'I', 'would', 'like', 'once', 'again', 'to', 'wish', 'you', 'a', 'happy', 'new', 'year', 'in', 'the', 'hope', 'that', 'you', u'enjoy', 'a', 'pleasant', 'festive', 'period', '.']

來源

2014-06-15 21:30:55 alvas

感謝您的幫助 –

爲什麼if語句需要列表理解？ –

因爲有些情況下POS不屬於字網POS類別，例如，代詞（「I」，「PN」）或判定者（「the」，「DT」） – alvas

Python NLTK：如何解析文本包括英語動詞？

回答

相關問題