>> from nltk.stem import WordNetLemmatizer as lm1
>> from nltk import WordNetLemmatizer as lm2
>> from nltk.stem.wordnet import WordNetLemmatizer as lm3
對我來說,三種方法的工作方式都是一樣的,但只是爲了確認,他們是否提供了不同的東西?爲什麼NLTK庫中有不同的Lemmatizers?
>> from nltk.stem import WordNetLemmatizer as lm1
>> from nltk import WordNetLemmatizer as lm2
>> from nltk.stem.wordnet import WordNetLemmatizer as lm3
對我來說,三種方法的工作方式都是一樣的,但只是爲了確認,他們是否提供了不同的東西?爲什麼NLTK庫中有不同的Lemmatizers?
不,他們沒有不同,他們都是一樣的。
from nltk.stem import WordNetLemmatizer as lm1
from nltk import WordNetLemmatizer as lm2
from nltk.stem.wordnet import WordNetLemmatizer as lm3
lm1 == lm2
>>> True
lm2 == lm3
>>> True
lm1 == lm3
>>> True
由於修正通過erip爲什麼發生這種情況是因爲:
該類(WordNetLemmatizer
)是origanlly寫在nltk.stem.wordnet所以你可以做from nltk.stem.wordnet import WordNetLemmatizer as lm3
裏面還導入NLTK __init__.py file所以你可以做from nltk import WordNetLemmatizer as lm2
而且還導入__init__.py nltk.stem模塊所以你可以做from nltk.stem import WordNetLemmatizer as lm1
答案:他們都是一樣的。
inspect
有用的工具來檢查對象是否是同一
>>> import inspect
>>> from nltk.stem import WordNetLemmatizer as wnl1
>>> from nltk.stem.wordnet import WordNetLemmatizer as wnl2
>>> inspect.getfile(wnl1)
'/Library/Python/2.7/site-packages/nltk/stem/wordnet.pyc'
# They come from the same file:
>>> inspect.getfile(wnl1) == inspect.getfile(wnl2)
True
>>> print inspect.getdoc(wnl1)
WordNet Lemmatizer
Lemmatize using WordNet's built-in morphy function.
Returns the input word unchanged if it cannot be found in WordNet.
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> print(wnl.lemmatize('dogs'))
dog
>>> print(wnl.lemmatize('churches'))
church
>>> print(wnl.lemmatize('aardwolves'))
aardwolf
>>> print(wnl.lemmatize('abaci'))
abacus
>>> print(wnl.lemmatize('hardrock'))
hardrock
您可以查看源代碼太:
>>> print inspect.getsource(wnl1)
class WordNetLemmatizer(object):
"""
WordNet Lemmatizer
Lemmatize using WordNet's built-in morphy function.
Returns the input word unchanged if it cannot be found in WordNet.
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> print(wnl.lemmatize('dogs'))
dog
>>> print(wnl.lemmatize('churches'))
church
>>> print(wnl.lemmatize('aardwolves'))
aardwolf
>>> print(wnl.lemmatize('abaci'))
abacus
>>> print(wnl.lemmatize('hardrock'))
hardrock
"""
def __init__(self):
pass
def lemmatize(self, word, pos=NOUN):
lemmas = wordnet._morphy(word, pos)
return min(lemmas, key=len) if lemmas else word
def __repr__(self):
return '<WordNetLemmatizer>'
# They have the same source code too:
>>> print inspect.getsource(wnl1) == inspect.getsource(wnl2)
True
在NLTK進口的WordNetLemmatizer
的結構如下:
\nltk
__init__.py
\stem.
__init__.py
wordnet.py # This is where WordNetLemmatizer code resides.
我們從其中最低的居住WordNetLemmatizer
在nltk.stem.wordnet.py
https://github.com/nltk/nltk/blob/develop/nltk/stem/wordnet.py#L15,所以你可以做:
from nltk.stem.wordnet import WordNetLemmatizer
從nltk.stem。 初始化的.py,我們看到在https://github.com/nltk/nltk/blob/develop/nltk/stem/init.py#L30上面的導入,使nltk.stem
訪問WordNetLemmatizer,這樣就可以做
from nltk.stem import WordNetLemmatizer
從nltk.__init__.py
我們看到:
from nltk.stem import *
這使最頂層nltk
導入以訪問nltk.stem
有權訪問的所有內容。因此,在頂層nltk
,我們可以這樣做:
from nltk import WordNetLemmatizer
但有一件事要注意,這是不總是對象/名稱相同的模塊是指NLTK同一個對象的情況下,例如:
>>> from nltk.corpus import wordnet as wn1
>>> from nltk.corpus.reader import wordnet as wn2
>>> wn1 == wn2
False
>>> wn1.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn2.synsets('dog')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'synsets'
第一共發現wn1
是LazyCorpusLoader
對象將在nltk_data
打開WordNet的文件,它允許您訪問的同義詞集:https://github.com/nltk/nltk/blob/develop/nltk/corpus/init.py#L246
第二wn2
是wordnet.py
文件本身駐留在nltk.corpus.wordnet.py
:https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py
它會變得更加棘手時:
>>> from nltk.corpus import wordnet as wn1
>>> from nltk.corpus.reader import wordnet as wn2
>>> from nltk.stem import wordnet as wn3
>>> wn3 == wn1
False
>>> wn3 == wn2
False
在wn3
的情況下,它指的是包含WordNetLemmatizer
的文件nltk.stem.wordnet.py
,它與wordnet的wordnet語料庫對象或語料庫閱讀器無關。
你的最後一點是不正確的。 NLTK使用'__init __。py'來隱藏它。與語言輸入機制的效率無關。見[這裏](https://github.com/nltk/nltk/blob/develop/nltk/__init__.py#L137),[這裏](https://github.com/nltk/nltk/blob/develop/) nltk/stem/__ init __。py#L30)和[here](https://github.com/nltk/nltk/blob/develop/nltk/stem/wordnet.py#L15)。 – erip
謝謝@erip更新了答案。 – harshil9968