這是一個follow-up of my question。我正在使用nltk解析出人員,組織及其關係。使用this example,我能夠創建大批人員和組織;然而,我在nltk.sem.extract_rel命令得到一個錯誤:使用NLTK提取關係
AttributeError: 'Tree' object has no attribute 'text'
下面是完整的代碼:
import nltk
import re
#billgatesbio from http://www.reuters.com/finance/stocks/officerProfile?symbol=MSFT.O&officerId=28066
with open('billgatesbio.txt', 'r') as f:
sample = f.read()
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences)
# tried plain ne_chunk instead of batch_ne_chunk as given in the book
#chunked_sentences = [nltk.ne_chunk(sentence) for sentence in tagged_sentences]
# pattern to find <person> served as <title> in <org>
IN = re.compile(r'.+\s+as\s+')
for doc in chunked_sentences:
for rel in nltk.sem.extract_rels('ORG', 'PERSON', doc,corpus='ieer', pattern=IN):
print nltk.sem.show_raw_rtuple(rel)
這個例子是非常相似的一個given in the book,但該示例使用準備好'解析文檔',這個文檔看起來不通,我不知道在哪裏找到它的對象類型。我也搜遍了git庫。任何幫助表示讚賞。
我的最終目標是爲一些公司提取人員,組織,職位(日期);然後創建個人和組織的網絡地圖。
你有沒有想出解決辦法?我可以看到你想出了什麼,因爲我得到了完全相同的問題。 – user3314418