2016-02-23 12 views
4

已經從here及以下these instruction下載CoreNLP服務器上,當我包括entitymentions作爲註釋:CoreNLP服務器不返回實體提到

wget --post-data 'Mark Ronson played a concert in New York.' 'localhost:9000/?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos,entitymentions", "outputFormat": "json"}' 

返回的JSON如下圖所示,儘管ner每個令牌加入,沒有提及的列表。

任何想法爲什麼?

(值得一提的是,corenlp.run似乎沒有返回它們 - 好像突出顯示的是後處理結果)。

{ 
    "sentences": [ 
        { 
            "index": 0, 
            "parse": "SENTENCE_SKIPPED_OR_UNPARSABLE", 
            "tokens": [ 
                { 
                    "index": 1, 
                    "word": "Mark", 
                    "originalText": "Mark", 
                    "lemma": "Mark", 
                    "characterOffsetBegin": 0, 
                    "characterOffsetEnd": 4, 
                    "pos": "NNP", 
                    "ner": "PERSON" 
                }, 
                { 
                    "index": 2, 
                    "word": "Ronson", 
                    "originalText": "Ronson", 
                    "lemma": "Ronson", 
                    "characterOffsetBegin": 5, 
                    "characterOffsetEnd": 11, 
                    "pos": "NNP", 
                    "ner": "PERSON" 
                }, 
                { 
                    "index": 3, 
                    "word": "played", 
                    "originalText": "played", 
                    "lemma": "play", 
                    "characterOffsetBegin": 12, 
                    "characterOffsetEnd": 18, 
                    "pos": "VBD", 
                    "ner": "O" 
                }, 
                { 
                    "index": 4, 
                    "word": "a", 
                    "originalText": "a", 
                    "lemma": "a", 
                    "characterOffsetBegin": 19, 
                    "characterOffsetEnd": 20, 
                    "pos": "DT", 
                    "ner": "O" 
                }, 
                { 
                    "index": 5, 
                    "word": "concert", 
                    "originalText": "concert", 
                    "lemma": "concert", 
                    "characterOffsetBegin": 21, 
                    "characterOffsetEnd": 28, 
                    "pos": "NN", 
                    "ner": "O" 
                }, 
                { 
                    "index": 6, 
                    "word": "in", 
                    "originalText": "in", 
                    "lemma": "in", 
                    "characterOffsetBegin": 29, 
                    "characterOffsetEnd": 31, 
                    "pos": "IN", 
                    "ner": "O" 
                }, 
                { 
                    "index": 7, 
                    "word": "New", 
                    "originalText": "New", 
                    "lemma": "New", 
                    "characterOffsetBegin": 32, 
                    "characterOffsetEnd": 35, 
                    "pos": "NNP", 
                    "ner": "LOCATION" 
                }, 
                { 
                    "index": 8, 
                    "word": "York.", 
                    "originalText": "York.", 
                    "lemma": "York.", 
                    "characterOffsetBegin": 36, 
                    "characterOffsetEnd": 41, 
                    "pos": "NNP", 
                    "ner": "LOCATION" 
                } 
            ] 
        } 
    ] 
} 

回答

3

無論好壞,我們目前都不會將實體提及輸出到我們的輸出。建議的解決方法是以與實體提及註釋器相同的方式後處理數據:相同NER的連續跨度被視爲實體提及。我相信實體提及對象中的所有註釋也都附加到組件標記。

相關問題