1
我有以下代碼:創建詞彙辭典文本挖掘
train_set = ("The sky is blue.", "The sun is bright.")
test_set = ("The sun in the sky is bright.",
"We can see the shining sun, the bright sun.")
現在我試着去計算這樣的詞頻:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
接下來我想打印voculabary。所以我做的:
vectorizer.fit_transform(train_set)
print vectorizer.vocabulary
現在我得到的輸出中沒有。雖然我期望類似的東西:
{'blue': 0, 'sun': 1, 'bright': 2, 'sky': 3}
任何想法,這出錯了?
[CountVectorizer不能打印詞彙表]的可能重複(http://stackoverflow.com/questions/28894756/countvectorizer-does-not-print-vocabulary) –