CountVectorizer只返回零

2017-03-06 67 views 4 likes

我想從給定的文檔中提取一些功能，給定一組預定義的功能。CountVectorizer只返回零

from sklearn.feature_extraction.text import CountVectorizer 
features = ['a', 'b', 'c'] 
doc = ['a', 'c'] 

vectoriser = CountVectorizer() 
vectoriser.vocabulary = features 
vectoriser.fit_transform(doc)

但是輸出，是一個2×3矩陣，用零代替填寫：

desired_output = [[1, 0, 0] 
        [0, 0, 1]]

任何幫助，將不勝感激

來源

2017-03-06 Immortalz

'doc'代表不同樣本的數據或同一樣本的不同特徵嗎？如果是前者，則此用法不適用於CountVectorizer。您可以使用[One-hot encoder]（http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html）。 –

回答

這是因爲在CountVectorizer默認令牌格局將擺脫只有一個字符長的任何單詞。您可以更改默認令牌模式以解決此問題：

from sklearn.feature_extraction.text import CountVectorizer 
features = ['a', 'b', 'c'] 
doc = ['a', 'c'] 

vectoriser = CountVectorizer(vocabulary=features, token_pattern=r"\b\w+\b") 

vectoriser.fit_transform(doc)

來源

2017-03-06 20:23:02 Kewl

'token_pattern = r「\ b \ w + \ b」'是否匹配任何標記（任何長度）？ – Immortalz

它會捕捉帶有一個或多個字符的單詞，假設它們在詞彙表中 – Kewl

相關問題

11. UIImage返回零
12. libvlc_media_player_get_time返回零
13. UIImage.animatedImageNamed返回零
14. NSBundle.mainBundle（）返回零
15. CFBundleDisplayName返回「零」
16. ABAddressBookGetPersonWithRecordID返回零
17. UserDefaults.standard.string（）返回零
18. MPMediaItemPropertyAssetURL返回零
19. AssetForURL返回零
20. NSBundle返回零
21. NSJSONSerialization.JSONObjectWithData返回零
22. UUID返回零
23. indexofobject返回零
24. UILabel返回零
25. cellForRowAtIndexPath返回零
26. mysql_num_rows（）返回零
27. Swift：NSDate.day返回零
28. contentsOfDirectoryAtPath返回零
29. AVCaptureDevice返回零
30. NSURLAuthenticationChallenge.sender返回零