我想從我的文件中的數據列中刪除停用詞。 我過濾了最終用戶講話時的線路。 但它並沒有過濾出與usertext.apply(lambda x: [word for word in x if word not in stop_words])
停止詞我做錯了什麼?從文件中刪除停用詞
import pandas as pd
from stop_words import get_stop_words
df = pd.read_csv("F:/textclustering/data/cleandata.csv", encoding="iso-8859-1")
usertext = df[df.Role.str.contains("End-user",na=False)][['Data','chatid']]
stop_words = get_stop_words('dutch')
clean = usertext.apply(lambda x: [word for word in x if word not in stop_words])
print(clean)
first can y ou 1)打印'stop_words',2)嘗試'clean = usertext.apply(lambda x:[])'看它是否刪除所有單詞? (只是測試) –
Data [] chatid [] dtype:object ['aan','al','alles','als','altijd','andere','ben','bij' ,'dar','dan','dat','de','der','deze','die','dit','doch','doen','door' een',eens,en,er,ge,geen,geweest,haar,had,heb,hebben,heeft, ,'het','hier','hij','hoe','hun','iemand','iets','ik','in','是','ja','je',' kan'kon'kunnen'maar'me''meer''men''met'mij'mijn'moet'na'naar' ,'niet','niets','nog','nu','of','om','omdat',...]這是 – DataNewB