不同的結果在StanfordNERTagger在python3.5 - 斯坦福大學NER，2015年12月9日

我試圖運行一個例句：不同的結果在StanfordNERTagger在python3.5 - 斯坦福大學NER，2015年12月9日

from nltk.tag import StanfordNERTagger 
_model_filename = r'D:/standford/stanford-ner-2015-12-09/classifiers/english.all.3class.distsim.crf.ser.gz' 

_path_to_jar = r'D:/standford/stanford-ner-2015-12-09/stanford-ner.jar' 

st = StanfordNERTagger(model_filename=_model_filename, path_to_jar=_path_to_jar) 

st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

我的產量，如下面的Python：

（''''，'PERSON'），（'Eid'，'PERSON'），（'是'，'O'），（'''''''''''''''''''''''' ），（'Stony'，'ORGANIZATION'），（'Brook'， 'ORGANIZATION'），（'University'，'ORGANIZATION'），（'in'，'O'），（'NY'， ' O'）]

雖然我期待紐約州也根據此reference選擇位置。

我試圖另一示例如下：

st.tag('Ali is living in London.'.split())

結果是如下這是正確的。

[（ '阿里'， '人'），（ '是'， 'O'），（ '活'， 'O'），（ '在'， 'O'），（」倫敦'，'LOCATION'）]

你知道爲什麼它沒有把NY當作第一句話的位置嗎？

我使用的Visual Studio 2015年的Python 3.5，斯坦福大學NER，2015年12月9日

來源

2016-07-22 Amir

廣告全給你的句子。沒有模型是完美的=） – alvas

斯坦福NER工具是如此標點符號是很重要的培訓上正確格式化新聞文本。來自docs：

斯坦福NER是命名實體識別器的Java實現。命名實體識別（NER）在文本中標記單詞序列，這些單詞是事物名稱，如人名和公司名稱，或基因和蛋白名稱。它配備精心設計的功能命名實體識別提取器，以及定義特徵提取器的許多選項。包含在下載中的是具有良好命名實體的英語識別器，特別是3類（PERSON， ORGANIZATION，LOCATION）的識別器，並且我們還在此頁面上提供了針對不同語言和情況的各種其他型號，，包括受過培訓的模型只是CoNLL 2003英語培訓數據。

從CoNLL 2003 doc：

英語數據從路透社語料庫新聞網文章的集合。註釋是由安特衛普大學的人員完成的。由於版權原因，我們只能提供註釋。爲了構建完整的數據集，您需要訪問路透社的語料庫。它可以從的研究目的中獲得，而不需要從NIST收取任何費用。

通過添加句號的例句，你應該得到你想要的輸出，但仍然沒有模型是完美=）

[email protected]:~$ export STANFORDTOOLSDIR=$HOME 
[email protected]:~$ export CLASSPATH=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/stanford-ner.jar 
[email protected]:~$ export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/classifiers 
[email protected]:~$ python3 
Python 3.5.2 (default, Jul 5 2016, 12:43:10) 
[GCC 5.4.0 20160609] on linux 
Type "help", "copyright", "credits" or "license" for more information. 
>>> from nltk.tag import StanfordNERTagger 
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> sent = 'Rami Eid is studying at Stony Brook University in NY .'.split() 
>>> st.tag(sent) 
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION'), ('.', 'O')] 
>>> sent = 'Rami Eid is studying at Stony Brook University in NY'.split() 
>>> st.tag(sent) 
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'O')]

來源

2016-07-22 09:32:50 alvas

感謝兄弟的解釋。 – Amir

我很高興答案有幫助。 – alvas

不同的結果在StanfordNERTagger在python3.5 - 斯坦福大學NER，2015年12月9日

回答

相關問題