我的方式來瀏覽輸出不是試圖解析字符串,而是建立一個對象並反序列化。然後你可以在本地使用該對象。
問題中顯示的輸出是使用名爲「prettyPrint」的管道上的選項生成的。我將其改爲「jsonPrint」來取代JSON輸出。然後我可以獲取輸出並從中生成一個類(VS可以通過粘貼特殊選項從JSON生成一個類,或者有像http://json2csharp.com/這樣的在線資源)。生成的類如下所示:
public class BasicDependency
{
public string dep { get; set; }
public int governor { get; set; }
public string governorGloss { get; set; }
public int dependent { get; set; }
public string dependentGloss { get; set; }
}
public class EnhancedDependency
{
public string dep { get; set; }
public int governor { get; set; }
public string governorGloss { get; set; }
public int dependent { get; set; }
public string dependentGloss { get; set; }
}
public class EnhancedPlusPlusDependency
{
public string dep { get; set; }
public int governor { get; set; }
public string governorGloss { get; set; }
public int dependent { get; set; }
public string dependentGloss { get; set; }
}
public class Token
{
public int index { get; set; }
public string word { get; set; }
public string originalText { get; set; }
public string lemma { get; set; }
public int characterOffsetBegin { get; set; }
public int characterOffsetEnd { get; set; }
public string pos { get; set; }
public string ner { get; set; }
public string speaker { get; set; }
public string before { get; set; }
public string after { get; set; }
public string normalizedNER { get; set; }
}
public class Sentence
{
public int index { get; set; }
public string parse { get; set; }
public List<BasicDependency> basicDependencies { get; set; }
public List<EnhancedDependency> enhancedDependencies { get; set; }
public List<EnhancedPlusPlusDependency> enhancedPlusPlusDependencies { get; set; }
public List<Token> tokens { get; set; }
}
public class RootObject
{
public List<Sentence> sentences { get; set; }
}
*注意:不幸的是,這種技術對於coref註釋並不適用。 JSON沒有正確轉換爲類。我現在正在處理這個問題。該模型是使用註釋器「tokenize,ssplit,pos,引理,ner,parse」從輸出中構建的。
我的代碼,只能從樣本代碼略有改變,看起來是這樣的(注意「pipeline.jsonPrint」):
public static string LanguageAnalysis(string sourceText)
{
string json = "";
// Path to the folder with models extracted from stanford-corenlp-3.7.0-models.jar
var jarRoot = @"..\..\..\..\packages\Stanford.NLP.CoreNLP.3.7.0.1\";
// Annotation pipeline configuration
var props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse");
props.setProperty("ner.useSUTime", "0");
// We should change current directory, so StanfordCoreNLP could find all the model files automatically
var curDir = Environment.CurrentDirectory;
Directory.SetCurrentDirectory(jarRoot);
var pipeline = new StanfordCoreNLP(props);
Directory.SetCurrentDirectory(curDir);
// Annotation
var annotation = new Annotation(sourceText);
pipeline.annotate(annotation);
// Result - JSON Print
using (var stream = new ByteArrayOutputStream())
{
pipeline.jsonPrint(annotation, new PrintWriter(stream));
json = stream.toString();
stream.close();
}
return json;
}
這似乎與這樣的代碼很好地反序列化:
using Newtonsoft.Json;
string sourceText = "My text document to parse.";
string json = Analysis.LanguageAnalysis(sourceText);
RootObject document = JsonConvert.DeserializeObject<RootObject>(json);
現在我正在處理結果對象,我意識到我的答案實際上並沒有回答這個問題!解析器輸出仍以與一個字符串相同的格式提供,稱爲「解析」。我現在加入@ user1700890尋找解析它的方法! –
進一步看,我看到這個問題,它似乎是相同的,並有一個答案使用PHP:[PHP和NLP:嵌套括號(解析器輸出)數組?](https://stackoverflow.com/questions/7917161/PHP-和NLP-嵌套括號解析器輸出到陣列) –