斯坦福依賴解析器 - 如何獲得跨度？

我正在用Java中的Stanford庫進行依賴分析。有沒有辦法找回我的原始字符串中的索引嗎？我曾試着撥打getSpans（）方法，但它對於每一個令牌返回null：斯坦福依賴解析器 - 如何獲得跨度？

LexicalizedParser lp = LexicalizedParser.loadModel(
     "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz", 
     "-maxLength", "80", "-retainTmpSubcategories"); 
TreebankLanguagePack tlp = new PennTreebankLanguagePack(); 
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(); 
Tree parse = lp.apply(text); 
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); 
Collection<TypedDependency> tdl = gs.typedDependenciesCollapsedTree(); 
for(TypedDependency td:tdl) 
{ 
     td.gov().getSpan() // it's null! 
     td.dep().getSpan() // it's null! 
}

任何想法？

來源

2013-04-16 Zsolt

我終於結束了寫我自己的輔助函數來獲得跨越了我的原始字符串：

public HashMap<Integer, TokenSpan> getTokenSpans(String text, Tree parse) 
{ 
    List<String> tokens = new ArrayList<String>(); 
    traverse(tokens, parse, parse.getChildrenAsList()); 
    return extractTokenSpans(text, tokens); 
} 

private void traverse(List<String> tokens, Tree parse, List<Tree> children) 
{ 
    if(children == null) 
     return; 
    for(Tree child:children) 
    { 
     if(child.isLeaf()) 
     { 
      tokens.add(child.value()); 
     } 
     traverse(tokens, parse, child.getChildrenAsList());   
    } 
} 

private HashMap<Integer, TokenSpan> extractTokenSpans(String text, List<String> tokens) 
{ 
    HashMap<Integer, TokenSpan> result = new HashMap<Integer, TokenSpan>(); 
    int spanStart, spanEnd; 

    int actCharIndex = 0; 
    int actTokenIndex = 0; 
    char actChar; 
    while(actCharIndex < text.length()) 
    { 
     actChar = text.charAt(actCharIndex); 
     if(actChar == ' ') 
     { 
      actCharIndex++; 
     } 
     else 
     { 
      spanStart = actCharIndex; 
      String actToken = tokens.get(actTokenIndex); 
      int tokenCharIndex = 0; 
      while(tokenCharIndex < actToken.length() && text.charAt(actCharIndex) == actToken.charAt(tokenCharIndex)) 
      { 
       tokenCharIndex++; 
       actCharIndex++; 
      } 

      if(tokenCharIndex != actToken.length()) 
      { 
       //TODO: throw exception 
      } 
      actTokenIndex++; 
      spanEnd = actCharIndex; 
      result.put(actTokenIndex, new TokenSpan(spanStart, spanEnd)); 
     } 
    } 
    return result; 
}

然後我會打電話給

getTokenSpans(originalString, parse)

所以我得到一張地圖，它可以將每個令牌映射到其對應的令牌範圍。這不是一個優雅的解決方案，但至少它的工作原理。

來源

2013-04-25 07:38:02 Zsolt

即使您已經回答了您自己的問題，而且這是一箇舊線程：我今天偶然發現了同樣的問題，但是使用（Stanford）LexicalizedParser而不是依賴解析器。沒有測試它依賴一個，但下面解決了我在lexParser方案問題：

List<Word> wl = tree.yieldWords(); 
int begin = wl.get(0).beginPosition(); 
int end = wl.get(wl.size()-1).endPosition(); 
Span sp = new Span(begin, end);

凡跨度則持有（子）樹的索引。（如果你一路走到終端，我想這同樣應該在令牌級別上工作）。

希望這可以幫助別人遇到同樣的問題！

來源

2016-11-23 13:34:19 Igor

斯坦福依賴解析器 - 如何獲得跨度？

回答

相關問題