試圖字符串轉換成數字矢量單個字母,空的詞彙通過CountVectorizer
### Clean the string
def names_to_words(names):
print('a')
words = re.sub("[^a-zA-Z]"," ",names).lower().split()
print('b')
return words
### Vectorization
def Vectorizer():
Vectorizer= CountVectorizer(
analyzer = "word",
tokenizer = None,
preprocessor = None,
stop_words = None,
max_features = 5000)
return Vectorizer
### Test a string
s = 'abc...'
r = names_to_words(s)
feature = Vectorizer().fit_transform(r).toarray()
但是,當我encoutered:
['g', 'o', 'm', 'd']
有錯誤:
ValueError: empty vocabulary; perhaps the documents only contain stop words
似乎有這種單字母字符串的問題。 我應該怎麼辦 THX
所以,你想要做什麼?在你的詞彙中加入這些單個字母的單詞? –