2015-08-22 43 views
3

有什麼辦法可以使用python或nltk獲得通用依賴關係?我只能生成分析樹。斯坦福大學對Python的通用依賴關係NLTK

例子:

輸入句子:

My dog also likes eating sausage. 

輸出:

Universal dependencies 

nmod:poss(dog-2, My-1) 
nsubj(likes-4, dog-2) 
advmod(likes-4, also-3) 
root(ROOT-0, likes-4) 
xcomp(likes-4, eating-5) 
dobj(eating-5, sausage-6) 
+1

請參閱https://pypi.python.org/pypi/PyStanfordDependencies/ http://stackoverflow.com/a/29614388/1118542 - PyStanfordDependencies現在可以執行Universal Dependencies。 – dmcc

回答

2

Wordseer's stanford-corenlp-python fork是一個良好的開端,因爲它與最近發佈CoreNLP(3.5.2)的作品。但它會給你原始的輸出,你需要手動轉換。例如,假設你有包裝的運行:

>>> import json, jsonrpclib 
>>> from pprint import pprint 
>>> 
>>> server = jsonrpclib.Server("http://localhost:8080") 
>>> 
>>> pprint(json.loads(server.parse('John loves Mary.'))) # doctest: +SKIP 
{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'0', u'loves', u'2'], 
            [u'nsubj', 
            u'loves', 
            u'2', 
            u'John', 
            u'1'], 
            [u'dobj', u'loves', u'2', u'Mary', u'3'], 
            [u'punct', u'loves', u'2', u'.', u'4']], 
       u'parsetree': [], 
       u'text': u'John loves Mary.', 
       u'words': [[u'John', 
          {u'CharacterOffsetBegin': u'0', 
           u'CharacterOffsetEnd': u'4', 
           u'Lemma': u'John', 
           u'PartOfSpeech': u'NNP'}], 
          [u'loves', 
          {u'CharacterOffsetBegin': u'5', 
           u'CharacterOffsetEnd': u'10', 
           u'Lemma': u'love', 
           u'PartOfSpeech': u'VBZ'}], 
          [u'Mary', 
          {u'CharacterOffsetBegin': u'11', 
           u'CharacterOffsetEnd': u'15', 
           u'Lemma': u'Mary', 
           u'PartOfSpeech': u'NNP'}], 
          [u'.', 
          {u'CharacterOffsetBegin': u'15', 
           u'CharacterOffsetEnd': u'16', 
           u'Lemma': u'.', 
           u'PartOfSpeech': u'.'}]]}]} 

如果你想使用依賴解析器,你可以重複使用NLTK的DependencyGraph有一點努力

>>> import jsonrpclib, json 
>>> from nltk.parse import DependencyGraph 
>>> 
>>> server = jsonrpclib.Server("http://localhost:8080") 
>>> parses = json.loads(
... server.parse(
...  'John loves Mary. ' 
...  'I saw a man with a telescope. ' 
...  'Ballmer has been vocal in the past warning that Linux is a threat to Microsoft.' 
... ) 
...)['sentences'] 
>>> 
>>> def transform(sentence): 
...  for rel, _, head, word, n in sentence['dependencies']: 
...   n = int(n) 
... 
...   word_info = sentence['words'][n - 1][1] 
...   tag = word_info['PartOfSpeech'] 
...   lemma = word_info['Lemma'] 
...   if rel == 'root': 
...    # NLTK expects that the root relation is labelled as ROOT! 
...    rel = 'ROOT' 
... 
...   # Hack: Return values we don't know as '_'. 
...   #  Also, consider tag and ctag to be equal. 
...   # n is used to sort words as they appear in the sentence. 
...   yield n, '_', word, lemma, tag, tag, '_', head, rel, '_', '_' 
... 
>>> dgs = [ 
...  DependencyGraph(
...   ' '.join(items) # NLTK expects an iterable of strings... 
...   for n, *items in sorted(transform(parse)) 
... ) 
...  for parse in parses 
... ] 
>>> 
>>> # Play around with the information we've got. 
>>> 
>>> pprint(list(dgs[0].triples())) 
[(('loves', 'VBZ'), 'nsubj', ('John', 'NNP')), 
(('loves', 'VBZ'), 'dobj', ('Mary', 'NNP')), 
(('loves', 'VBZ'), 'punct', ('.', '.'))] 
>>> 
>>> print(dgs[1].tree()) 
(saw I (man a (with (telescope a))) .) 
>>> 
>>> print(dgs[2].to_conll(4)) # doctest: +NORMALIZE_WHITESPACE 
Ballmer  NNP  4  nsubj 
has   VBZ  4  aux 
been  VBN  4  cop 
vocal  JJ  0  ROOT 
in   IN  4  prep 
the   DT  8  det 
past  JJ  8  amod 
warning  NN  5  pobj 
that  WDT  13  dobj 
Linux  NNP  13  nsubj 
is   VBZ  13  cop 
a   DT  13  det 
threat  NN  8  rcmod 
to   TO  13  prep 
Microsoft NNP  14  pobj 
.   .  4  punct 
<BLANKLINE> 

設置CoreNLP並不難,查詢http://www.eecs.qmul.ac.uk/~dm303/stanford-dependency-parser-nltk-and-anaconda.html瞭解更多詳情。

相關問題