Python - NLTK語料庫中tagged_sents與tagged_words的區別

nltk tagged_sents與tagged_words有什麼區別？Python - NLTK語料庫中tagged_sents與tagged_words的區別

他們似乎都是元組列表（單詞，標籤）。如果你做的類型（），它們都是

nltk.collections.LazySubsequence

來源

2017-05-28 thatMeow

從docs：

Corpus reader functions are named based on the type of information they return. 
Some common examples, and their return types, are: 
- words(): list of str 
- sents(): list of (list of str) 
- paras(): list of (list of (list of str)) 
- tagged_words(): list of (str,str) tuple 
- tagged_sents(): list of (list of (str,str)) 
- tagged_paras(): list of (list of (list of (str,str))) 
- chunked_sents(): list of (Tree w/ (str,str) leaves) 
- parsed_sents(): list of (Tree with str leaves) 
- parsed_paras(): list of (list of (Tree with str leaves)) 
- xml(): A single xml ElementTree 
- raw(): unprocessed corpus contents 


>>> from nltk.corpus import brown 

>>> brown.tagged_words() 
[(u'The', u'AT'), (u'Fulton', u'NP-TL'), ...] 

>>> len(brown.tagged_words()) # no. of words in the corpus. 
1161192 


>>> len(brown.tagged_sents()) # no. of sentence in the corpus. 
57340 

# Loop through the sentences and counts the words per sentence. 
>>> sum(len(sent) for sent in brown.tagged_sents()) # no. of words in the corpus. 
1161192

來源

2017-05-29 00:07:33 alvas

Python - NLTK語料庫中tagged_sents與tagged_words的區別

回答

相關問題