正常化斯坦福NLP貨幣不按預期工作

我正在使用斯坦福CoreNLP進行提取。下面是從我試圖用貨幣符號€5億0.875％的正常化斯坦福NLP貨幣不按預期工作

2015年3月5日開雲集團發行沿提取貨幣句子

，我需要提取的數據是€5億0.875

NLP在默認情況下其捐贈一句

2015年3月5日的** ** $ 0.875 5億百分之

開雲集團發行

所以我寫了

public static readonly TokenizerFactory TokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), 
      "normalizeCurrency=false"); 
DocumentPreprocessor docPre = new DocumentPreprocessor(new java.io.StringReader(textChunk)); 
docPre.setTokenizerFactory(TokenizerFactory);

現在，這句話是否正確來作爲

3月5日的€5億0.875％的

2015年開雲集團的問題，而是當我做

props.put("annotators", "tokenize, cleanxml, ssplit, pos, lemma, ner, regexner"); 
props.setProperty("ner.useSUTime", "0"); 
_pipeline = new StanfordCoreNLP(props); 
Annotation document = new Annotation(text); 
_pipeline.annotate(document);

哪裏text = 2015年3月5日Kering發行500,000歐元0.875％

我得到輸出

<token id="9"> 
    <word>$</word> 
    <lemma></lemma> 
    <CharacterOffsetBegin>48</CharacterOffsetBegin> 
    <CharacterOffsetEnd>49</CharacterOffsetEnd> 
    <POS>CD</POS> 
    <NER>MONEY</NER> 
    <NormalizedNER>$5.000000000875E9</NormalizedNER> 
</token>

所以我增加了行props.put("tokenize.options", "normalizeCurrency=false"); 但還是輸出一樣的是$ 5.000000000875E9

可有人請幫助我。謝謝

來源

2017-03-23 Madhu

當我運行這段代碼並沒有改變貨幣符號「$」：

package edu.stanford.nlp.examples; 

import edu.stanford.nlp.ling.*; 
import edu.stanford.nlp.pipeline.*; 

import java.util.*; 

public class TokenizeOptionsExample { 

    public static void main(String[] args) { 
    Annotation document = new Annotation("5 March 2015 Kering Issue of €500,000,000 0.875 per cent"); 
    Properties props = new Properties(); 
    props.setProperty("annotators", "tokenize,ssplit"); 
    props.setProperty("tokenize.options", "normalizeCurrency=false"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
    pipeline.annotate(document); 
    for (CoreLabel token : document.get(CoreAnnotations.TokensAnnotation.class)) { 
     System.out.println(token); 
    } 
    } 
}

來源

2017-03-27 06:48:03 StanfordNLPHelp

謝謝您的幫助。我添加了props.setProperty（「tokenize.options」，「normalizeCurrency = false」）;先前然而輸出是沒有任何貨幣符號爲<令牌ID = 「9」> 48 49 CD MONEY 5.000000000875E9 – Madhu

請問您可以告訴如何在C＃中編寫代碼（CoreLabel token：document.get（CoreAnnotations.TokensAnnotation.class））。因爲我對Java不太瞭解。並使用c＃進行提取。 – Madhu

正常化斯坦福NLP貨幣不按預期工作

回答

相關問題