0
我有一個非常大的.csv
文件(1065行x 1列)。每一行都有句子。我想從每行中的單詞表(.csv文件)中選取幾個重要單詞,然後爲每一行製作數據詞頻率。如何從python的每一行csv文件中提取單詞?
我有一個非常大的.csv
文件(1065行x 1列)。每一行都有句子。我想從每行中的單詞表(.csv文件)中選取幾個重要單詞,然後爲每一行製作數據詞頻率。如何從python的每一行csv文件中提取單詞?
我剛剛試圖放下一些東西,希望這會幫助你。這可能會更有效率地完成,但它完成了這項工作。
輸入文件示例
bla bla bla. bla! bla bla apple!, :banana. apple!!!
banana bla bla, apple and banana
peach 12345 bla bla peach and banana, peach, banana! :apple
代碼
# Your inputs
list_words = ['apple', 'banana','peach']
filename = 'example.txt'
# Set of characters to remove to tokenize the file's line
rm = ",:;?/-!."
# Read through the file per each line and do the math
with open(filename,'r') as fin:
for count_line, line in enumerate(fin,1):
clean_line = filter(lambda x: not (x in rm), line)
# To hold the counts of each word
words_frequency = {key: 0 for key in list_words}
for w in clean_line.split():
if w in list_words:
words_frequency[w] += 1
print 'Line', count_line,':', words_frequen
輸出:
Line 1 : {'apple': 2, 'peach': 0, 'banana': 1}
Line 2 : {'apple': 1, 'peach': 0, 'banana': 2}
Line 3 : {'apple': 1, 'peach': 3, 'banana': 2}
非常感謝您的評論... –
你查過或試圖csv模塊? https://docs.python.org/2/library/csv.html – kponz
如果它只有一列是真的csv文件? –
你能提供樣品嗎?幾行文件,要查找的單詞以及期望的輸出。 – gl051