0
我使用下面的方法提取從熊貓數據幀中的n-gram的NGRAM術語:獲取的頻率,爲每個sklearn
def extractNGrams(df, ngram_size, min_freq):
"""Extract NGrams from a list of Strings
Keyword arguments:
df -- the pandas dataframe containing the sentences
ngram_size -- defining the n for ngrams
min_freq --- the minimum frequency for the ngram to be part of the set
"""
vect = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size,ngram_size), min_df=min_freq)
lstSentences = df['Text'].values.tolist()
X_train_counts = vect.fit_transform(lstSentences)
vocab = vect.get_feature_names()
#print (vocab)
print (X_train_counts.shape)
return vocab
我想了解,以獲得頻率爲每個NGRAM的方式條款?
在你定義你的條件和特徵指數之間的映射的詞彙變量的代碼。例如{「word1」:0,「word2」:1}。您需要的頻率由變量X_train_counts的非零項給出。也就是說,如果第一列的值是2,那麼「word1」會出現兩次。這有幫助嗎? – geompalik
@geompalik明白了.. !!它有助於..!!謝謝!! – Bonson