從CoreNLP獲得原始句子

我正在瀏覽我的數據並希望將其分解爲句子。我正在使用pycorenlp。從CoreNLP獲得原始句子

from pycorenlp import StanfordCoreNLP 
nlp = StanfordCoreNLP('http://localhost:9000') 
output = nlp.annotate(text, properties={ 
    'annotators': 'tokenize,ssplit', 
    'outputFormat': 'json' 
}) 
for tempsentence in output['sentences']: 
    # store important sentences ...

現在我存儲了一些對我的應用程序很重要的句子。其中一些含有「或」，看起來CoreNLP改變了這些句子。「如果我沒有記錯的話，它們會轉換成-LRB-和-RRB-。

是否有可能，我可以從CoreNLP得到orignial句子（因爲我需要做以後運行另一個CoreNLP，如果「現在已經沒有了，我的數據看起來並不orgininal和第二CoreNLP來看似乎並不到了認識一些quotiations

來源

2017-04-16 Chris

下載並安裝節庫：。https://github.com/stanfordnlp/stanza
返回的結果將有原來的令牌

例如：

from stanza.nlp.corenlp import CoreNLPClient 
client = CoreNLPClient(server='http://localhost:9000', default_annotators=['ssplit', 'tokenize']) 
result = client.annotate("...") 
for sentence in result.sentences: 
    for token in sentence.tokens: 
    print token.word + "\t" + token.originalText

來源

2017-04-17 02:08:57 StanfordNLPHelp

非常好，非常感謝 - 順便說一下，真棒！ – Chris

從CoreNLP獲得原始句子

回答

相關問題