使用正則表達式識別窗體\ d \ d \ d \ d \ d \ d的日期

我在管道中使用斯坦福正則表達式。我想識別[0-9] [0-9] - [0-9] [0-9] - [0-9] [0-9]（例如27-02-16）形式的字符串爲日期，其中ner識別爲NUMBER。所以，我在一個映射文件中定義了一個正則表達式並將其提供給regexner。但是regexNer無法識別日期等字符串。這些令牌的ner仍然是NUMBER。以下是映射文件：使用正則表達式識別窗體 d d d d d d的日期

[0-9]{2}-[0-9]{2}-[0-9]{2} date NUMBER

我保證，列製表符分隔。我嘗試了幾個版本的正則表達式，例如\ d \ d- \ d \ d- \ d \ d和[0-9] [0-9] - [0-9] [0-9] - [0-9] [0-9]，但他們都沒有工作。任何指向我可能會出錯的地方？我正在使用斯坦福CoreNLP 3.7。這裏是我正在運行的Java代碼。

Properties PROPS = new Properties(); 

PROPS.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner"); 
     StanfordCoreNLP PIPELINE = new StanfordCoreNLP(PROPS); 
     PIPELINE.addAnnotator(
       new RegexNERAnnotator("/home/jyoti/workspace-jee/QA_Rest/src/main/resources/Gazetter.txt"));

我進一步調查，發現正則表達式不匹配，只有當它由全整數任何字符串。我試着用字母作爲它的前綴，它起作用了（即a \ d \ d- \ d \ d- \ d \ d與a14-07-12匹配）。

來源

2017-04-14 user7568303

@stanfordnlphelp，任何指針請。 – user7568303

你是如何運行這個的，因爲你的原始規則適合我。

我發出這樣的命令：

java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,regexner -regexner.mapping date-rules.txt -file date-example.txt -outputFormat text

來源

2017-04-16 03:54:17 StanfordNLPHelp

我正在通過java api運行它。我在編輯中添加了代碼。 – user7568303

在構建管道之前，需要刪除「addAnnotator」行並添加一行爲PROPS.setProperty（「regexner.mapping」，「/path/to/rules-file.txt」）的行... regexner是使用規則的註釋器和regexner.mapping是設置要使用的規則文件的屬性 – StanfordNLPHelp

使用正則表達式識別窗體\ d \ d \ d \ d \ d \ d的日期

回答

相關問題