0
我有一個是這樣的代碼:過濾NLTK兩字組頻率(Python3,NLTK)
df1 = df[['term']]
df2 = df1.to_string()
words = nltk.word_tokenize(df2)
bgs = nltk.bigrams(words)
fdist = nltk.FreqDist(bgs)
我現在該如何過濾FDIST只發現那些出現2倍以上的雙字母組?
我有一個是這樣的代碼:過濾NLTK兩字組頻率(Python3,NLTK)
df1 = df[['term']]
df2 = df1.to_string()
words = nltk.word_tokenize(df2)
bgs = nltk.bigrams(words)
fdist = nltk.FreqDist(bgs)
我現在該如何過濾FDIST只發現那些出現2倍以上的雙字母組?
這是我做的,我的目的(不是最直接的,但我想我想補充我的兩分錢):將數據放入一個新的數據幀,在數據幀
frequencies = [[" ".join(k),v] for k,v in fdist.items()]
frame = pd.DataFrame(frequencies, columns=['Bigrams','Frequency'])
removal = frame[frame['Frequency'] >= 10]
嘗試過濾...
for obj in fdist.most_common():
if obj[1] >2:
print(obj)
for obj in fdist:
if fdist1[obj] >2:
print(obj, fdist1[obj])