我最近開始使用nltk模塊進行文本分析。我被困在一個點上。我想在數據框上使用word_tokenize,以獲取數據框特定行中使用的所有單詞。如何在數據框中使用word_tokenize
data example:
text
1. This is a very good site. I will recommend it to others.
2. Can you please give me a call at 9983938428. have issues with the listings.
3. good work! keep it up
4. not a very helpful site in finding home decor.
expected output:
1. 'This','is','a','very','good','site','.','I','will','recommend','it','to','others','.'
2. 'Can','you','please','give','me','a','call','at','9983938428','.','have','issues','with','the','listings'
3. 'good','work','!','keep','it','up'
4. 'not','a','very','helpful','site','in','finding','home','decor'
基本上,我想分開所有單詞並找到數據框中每個文本的長度。
我知道word_tokenize可以爲它的字符串,但如何將它應用到整個數據框?
請幫忙!
在此先感謝...
您的問題描述缺少數據輸入,您的代碼,您期望的輸出可以充實嗎?謝謝 – EdChum
@EdChum:已編輯查詢。希望它具有所需的信息。 – eclairs