1
我想讀取一個文本文件(foo1.txt),刪除所有nltk定義的停用詞並在另一個文件(foo2.txt)中寫入。代碼如下: 需要進口:從nltk.corpus進口停用詞刪除NLTK中的停用詞
def stop_words_removal():
with open("foo1.txt") as f:
reading_file_line = f.readlines() #entire content, return list
#print reading_file_line #list
reading_file_info = [item.rstrip('\n') for item in reading_file_line]
#print reading_file_info #List and strip \n
#print ' '.join(reading_file_info)
'''-----------------------------------------'''
#Filtering & converting to lower letter
for i in reading_file_info:
words_filtered = [e.lower() for e in i.split() if len(e) >= 4]
print words_filtered
'''-----------------------------------------'''
'''removing the strop words from the file'''
word_list = words_filtered[:]
#print word_list
for word in words_filtered:
if word in nltk.corpus.stopwords.words('english'):
print word
print word_list.remove(word)
'''-----------------------------------------'''
'''write the output in a file'''
z = ' '.join(words_filtered)
out_file = open("foo2.txt", "w")
out_file.write(z)
out_file.close()
的問題是代碼「從文件中刪除滑索的話」的第二部分不起作用。任何建議將不勝感激。謝謝。
Example Input File:
'I a Love this car there', 'positive',
'This a view is amazing there', 'positive',
'He is my best friend there', 'negative'
Example Output:
['love', "car',", "'positive',"]
['view', "amazing',", "'positive',"]
['best', "friend',", "'negative'"]
我想,因爲它在這個link建議,但他們沒有工作
你確定這是你想要的輸出嗎?你需要標點符號嗎? – elyase
@elyase感謝您的回覆。其實我不需要方括號,但我需要明確分隔每一行。您發佈的以下代碼僅適用於文件的最後一行。我想刪除文本每一行中的停用詞。 – J4cK
好的我編輯了我的答案 – elyase