2017-03-16 114 views
6

解析樹我有一個句子約翰看到在商店華而不實的帽子
如下圖所示如何表示這是一個依賴關係樹?依賴於Spacy

(S 
     (NP (NNP John)) 
     (VP 
     (VBD saw) 
     (NP (DT a) (JJ flashy) (NN hat)) 
     (PP (IN at) (NP (DT the) (NN store))))) 

我從here

import spacy 
from nltk import Tree 
en_nlp = spacy.load('en') 

doc = en_nlp("John saw a flashy hat at the store") 

def to_nltk_tree(node): 
    if node.n_lefts + node.n_rights > 0: 
     return Tree(node.orth_, [to_nltk_tree(child) for child in node.children]) 
    else: 
     return node.orth_ 


[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents] 

我正在以下,但我找一棵樹(NLTK)格式得到這個腳本。

 saw     
    ____|_______________  
|  |   at 
|  |   | 
|  hat  store 
|  ___|____  | 
John a  flashy the 

回答

3

文本表述之外,你想達到什麼是獲得選區樹了依賴圖。你想要的輸出的例子是一個經典的選區樹(如在短語結構語法中,與依賴語法相反)。

雖然從選區樹到依賴圖的轉換或多或少都是自動化任務(例如,http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf),但其他方向卻不是。已經有這方面的工作,檢查PAD項目https://github.com/ikekonglp/PAD和描述基礎算法的文章:http://homes.cs.washington.edu/~nasmith/papers/kong+rush+smith.naacl15.pdf

你也可能要重新考慮,如果你真的需要一個選區解析,這裏是一個很好的理由:https://linguistics.stackexchange.com/questions/7280/why-is-constituency-needed-since-dependency-gets-the-job-done-more-easily-and-e

3

要重新創建SpaCy依賴解析一個NLTK風格的樹,請嘗試使用draw方法從nltk.tree而不是pretty_print

import spacy 
from nltk.tree import Tree 

spacy_nlp = spacy.load("en") 

def nltk_spacy_tree(sent): 
    """ 
    Visualize the SpaCy dependency tree with nltk.tree 
    """ 
    doc = spacy_nlp(sent) 
    def token_format(token): 
     return "_".join([token.orth_, token.tag_, token.dep_]) 

    def to_nltk_tree(node): 
     if node.n_lefts + node.n_rights > 0: 
      return Tree(token_format(node), 
         [to_nltk_tree(child) 
         for child in node.children] 
        ) 
     else: 
      return token_format(node) 

    tree = [to_nltk_tree(sent.root) for sent in doc.sents] 
    # The first item in the list is the full tree 
    tree[0].draw() 

注意,因爲只有SpaCy目前支持依存分析,並在字和名詞短語級標記,SpaCy樹木不會像深深結構爲那些你從得到的,對於實例,斯坦福解析器,你可以所以想象成一棵樹:

from nltk.tree import Tree 
from nltk.parse.stanford import StanfordParser 

# Note: Download Stanford jar dependencies first 
# See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk 
stanford_parser = StanfordParser(
    model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz" 
) 

def nltk_stanford_tree(sent): 
    """ 
    Visualize the Stanford dependency tree with nltk.tree 
    """ 
    parse = stanford_parser.raw_parse(sent) 
    tree = list(parse) 
    # The first item in the list is the full tree 
    tree[0].draw() 

現在,如果我們同時運行,nltk_spacy_tree("John saw a flashy hat at the store.")會產生this imagenltk_stanford_tree("John saw a flashy hat at the store.")會產生this one