2013-07-03 52 views
3

我想使用斯坦福分析器創建.conll文件以供進一步處理。 到目前爲止,我設法解析測試句子用命令:創建.conll文件作爲斯坦福分析器的輸出

stanford-parser-full-2013-06-20/lexparser.sh stanford-parser-full-2013-06-20/data/testsent.txt > output.txt 

而不是一個txt文件,我想在.conll文件。我很確定這是可能的,在文檔中提到它(參見here)。我可以以某種方式修改我的命令,還是必須編寫Java代碼?

感謝您的幫助!

回答

8

如果您正在尋找CoNLL X(CoNLL 2006)格式打印出來的依賴,嘗試此命令行:

java -mx150m -cp "stanford-parser-full-2013-06-20/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz stanford-parser-full-2013-06-20/data/testsent.txt >testsent.tree 

java -mx150m -cp "stanford-parser-full-2013-06-20/*:" edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile testsent.tree -conllx 

這裏的第一個測試句子輸出:

1  Scores  _  NNS  NNS  _  4  nsubj  _  _ 
2  of   _  IN  IN  _  0  erased  _  _ 
3  properties _  NNS  NNS  _  1  prep_of  _  _ 
4  are   _  VBP  VBP  _  0  root   _  _ 
5  under   _  IN  IN  _  0  erased  _  _ 
6  extreme  _  JJ  JJ  _  8  amod   _  _ 
7  fire   _  NN  NN  _  8  nn   _  _ 
8  threat  _  NN  NN  _  4  prep_under _  _ 
9  as   _  IN  IN  _  13  mark   _  _ 
10  a    _  DT  DT  _  12  det   _  _ 
11  huge   _  JJ  JJ  _  12  amod   _  _ 
12  blaze   _  NN  NN  _  15  xsubj  _  _ 
13  continues  _  VBZ  VBZ  _  4  advcl  _  _ 
14  to   _  TO  TO  _  15  aux   _  _ 
15  advance  _  VB  VB  _  13  xcomp  _  _ 
16  through  _  IN  IN  _  0  erased  _  _ 
17  Sydney  _  NNP  NNP  _  20  poss   _  _ 
18  's   _  POS  POS  _  0  erased  _  _ 
19  north-western _  JJ  JJ  _  20  amod   _  _ 
20  suburbs  _  NNS  NNS  _  15  prep_through _  _ 
21  .    _  .  .  _  4  punct  _  _ 
+0

多數民衆贊成在完美!謝謝 – Rattlesnake

+0

你能爲德語做這個嗎? –

3

我不知道,你可以通過命令行做到這一點,但是這是一個Java版本:

for (List<HasWord> sentence : new DocumentPreprocessor(new StringReader(filename))) { 
     Tree parse = lp.apply(sentence); 

     GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); 
     GrammaticalStructure.printDependencies(gs, gs.typedDependencies(), parse, true, false); 
} 
+0

是爲我工作的WEL l,謝謝 – Rattlesnake