2016-01-05 211 views
1

如果我有一個字符串,如這樣的:如何打印出標籤蟒蛇

text = "They refuse to permit us." 

txt = nltk.word_tokenize(text) 

有了這個,如果我打印的POS標籤; nltk.pos_tag(txt)我得到

[( '他們', 'PRP'),( '拒絕', 'VBP'),( '到', 'TO'),( '許可證', 'VB'), ( '我們', 'PRP')]

我怎麼能只打印出這一點:

[ 'PRP', 'VBP', 'TO', 'VB', 'PRP' ]

回答

1

你得到了一個元組列表,你應該遍歷它得到每個元素的第二個元素元組。

>>> tagged = nltk.pos_tag(txt) 
>>> tags = [ e[1] for e in tagged] 
>>> tags 
['PRP', 'VBP', 'TO', 'VB', 'PRP'] 
1

看看Unpacking a list/tuple of pairs into two lists/tuples

>>> from nltk import pos_tag, word_tokenize 
>>> text = "They refuse to permit us." 
>>> tagged_text = pos_tag(word_tokenize(text)) 
>>> tokens, pos = zip(*tagged_text) 
>>> pos 
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.') 

可能在某些時候你會發現POS惡搞是緩慢的,你需要做到這一點(見Slow performance of POS tagging. Can I do some kind of pre-warming?):

>>> from nltk import pos_tag, word_tokenize 
>>> from nltk.tag import PerceptronTagger 
>>> tagger = PerceptronTagger() 
>>> text = "They refuse to permit us." 
>>> tagged_text = tagger.tag(word_tokenize(text)) 
>>> tokens, pos = zip(*tagged_text) 
>>> pos 
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.') 
0

你可以迭代像 -

print [x[1] for x in nltk.pos_tag(txt)]