Python：如何從句子中計數pos標籤？

我有這個link的代碼。它返回POS標籤及其出現編號。我將如何實現代碼，而不是輸入標籤，而是輸入一個句子，並根據語料庫（本例中爲布朗語料庫）從每個單詞中返回單詞和不同的pos標籤。Python：如何從句子中計數pos標籤？

def findtags(tag_prefix, tagged_text): 
    cfd = nltk.ConditionalFreqDist((tag, word) for (word, tag) in tagged_text 
            if tag.startswith(tag_prefix)) 
    return dict((tag, cfd[tag].keys()[:5]) for tag in cfd.conditions()) 

tagdictNNS = findtags('NNS', nltk.corpus.brown.tagged_words()) 

for tag in sorted(tagdictNNS): 
    print tag, tagdictNNS[tag] 

for k,v in tagdictNNS.items(): 
     new[k] = len(tagdictNNS[k]) 

print new

來源

2014-01-06 Helena

有在the documentation一個例子（靠近頁面最下方），可能是相關的：

nltk.tag定義了幾個標註器，它需要一個記號表（通常是一個句子），分配給每個標記添加一個標記，並返回標記的標記的結果列表。大多數標記器都是基於訓練語料庫自動構建的。例如，UnigramTagger標籤每個詞w通過檢查什麼最頻繁的標籤用W是在訓練語料庫：

from nltk.corpus import brown 
from nltk.tag import UnigramTagger 
tagger = UnigramTagger(brown.tagged_sents(categories='news')[:500]) 
sent = ['Mitchell', 'decried', 'the', 'high', 'rate', 'of', 'unemployment'] 
for word, tag in tagger.tag(sent): 
    print(word, '->', tag)

其中給出：

Mitchell -> NP 
decried -> None 
the -> AT 
high -> JJ 
rate -> NN 
of -> IN 
unemployment -> None

來源

2014-01-06 23:29:48 jonrsharpe

如果是英語，你可以試試這個：

>>> from nltk.tag import pos_tag 
>>> from nltk.tokenize import word_tokenize 
>>> sent = "This is a foo bar sentence." 
>>> pos_tag(word_tokenize(sent)) 
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('foo', 'NN'), ('bar', 'NN'), ('sentence', 'NN'), ('.', '.')] 
>>> from collections import Counter 
>>> Counter([j for i,j in pos_tag(word_tokenize(sent))]) 
Counter({'NN': 3, 'DT': 2, 'VBZ': 1, '.': 1})

NLTK有word tokenization（nltk.tokenize.word_tokenize和內置模塊（nltk.tag.pos_tag）使用Penn Treebank標記。然後，您可以簡單地將標記語句中的pos標記列表輸入Counter。

如果你想組一個PUNCT標籤的標點符號，你可以試試這個：

>>> import string 
>>> Counter([k if k not in string.punctuation else "PUNCT" for k in [j for i,j in pos_tag(word_tokenize(sent))]]) 
Counter({'NN': 3, 'DT': 2, 'VBZ': 1, 'PUNCT': 1})

來源

2014-01-07 00:10:49 alvas

>>>from nltk.tag import pos_tag 
>>>from nltk.tokenize import word_tokenize 

>>>sent = "This is a foo bar sentence." 
>>>text= pos_tag(word_tokenize(sent)) 
>>>print(text) 

>>>from collections import Counter 
>>>count= Counter([j for i,j in pos_tag(word_tokenize(sent))]) 
>>>print (count)

來源

2015-12-17 15:49:04 hamad

感謝編輯的EM新來的 – hamad

一些評論將是有益的。 – zero323

Python：如何從句子中計數pos標籤？

回答

相關問題