斯坦福CoreNLP非常慢

我在做一個自然語言處理項目Windows問題是每當我從命令提示符運行斯坦福CoreNLP時，需要大約14-15秒來生成給定輸入文本文件的XML輸出。我認爲這個問題是因爲該庫需要花費相當多的時間來加載。可以請有人解釋問題是什麼，我該如何解決這個問題，因爲這個時間問題對我的項目來說是一個大問題？斯坦福CoreNLP非常慢

來源

2012-06-27 agarwav

斯坦福大學CoreNLP爲各種組件使用大型參數模型文件。是的，他們需要很多時間來加載。你想要做的只是啓動程序一次，然後餵它大量的文本。

你怎麼做，取決於你在做什麼：

你可以傳遞一個-filelist的命令行版本同時處理一大堆的文件。
您可以讓一個StanfordCoreNLP對象運行，並向其發送文件並使用API獲取輸出。
根據您需要的NLP處理方式，您也可以通過不加載未使用的模型來加速啓動。請參閱「註釋器」屬性。

更新2016年有這個文檔頁面上立即更多信息Understanding memory and time usage

來源

2012-06-27 05:43:42

「您可以留下一個StanfordCoreNLP對象中運行，並且將文件發送給它，得到的輸出回使用API。」這似乎是專門針對我的問題的解決方案。你能告訴我該怎麼做嗎？ – agarwav

如果您正在編寫Java代碼，可以使用[Java API]（http://stanfordnlp.github.io/CoreNLP/api.html）執行此操作。否則，你最好的選擇是使用[StanfordCoreNLPServer]（http://stanfordnlp.github.io/CoreNLP/corenlp-server.html）。其他編程語言的幾個軟件包現在使用這個服務器，您也可以像命令一樣使用它，例如使用StanfordCoreNLPClient。 –

要了解如何使用API來檢查示例代碼「NERDemo.java」的核心NLP的下載的文件夾。

來源

2012-12-02 21:27:32 ArisRe82

該文件在哪裏？我無法在Core NLP –

的最新版本中找到它。該文件存在於斯坦福分類中，並且不存在覈心NLP。 – ArisRe82

克里斯托弗是正確的，這裏是解決方案之一：

import java.util.Properties; 
import edu.stanford.nlp.pipeline.Annotation; 
import edu.stanford.nlp.pipeline.StanfordCoreNLP; 

public class SentimentAnalyzer { 
    private StanfordCoreNLP pipeline; 

    public void initializeCoreNLP() { 
     Properties props = new Properties(); 
     props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment"); 
     pipeline = new StanfordCoreNLP(props); 
    } 

    public T getSentiment(String text) { 
     ... 
     Annotation annotation= new Annotation(text); 
     pipeline.annotate(annotation); 
     ... 
     return ... 
    } 

    public static void main(String[] argv) { 
     SentimentAnalyzer sentimentAnalyzer = new SentimentAnalyzer(); 
     sentimentAnalyzer.initializeCoreNLP(); // run this only once 
     T t = sentimentAnalyzer.getSentiment("put text here..."); // run this multiple times 
    } 
}

來源

2014-06-01 19:17:11 faustineinsun

斯坦福CoreNLP非常慢

回答

相關問題