2015-11-19 77 views
-1

我有一個字符串列表,它有重複的值,我想創建單詞字典,其中鍵將是單詞,其值將是頻率計數,然後寫下這些文字和它們的值在CSV:將重複值的列表轉換爲Python中的頻率計數字典

以下是我的方式做同樣的:

#!/usr/bin/env python 
# encoding: utf-8 

# -*- coding: utf8 -*- 
import csv 
from nltk.tokenize import TweetTokenizer 
import numpy as np 

tknzr = TweetTokenizer() 

#print tknzr.tokenize(s0) 

with open("dispn.csv","r") as file1,\ 
    open("dispn_tokenized.csv","w") as file2,\ 
    open("dispn_tokenized_count.csv","w") as file3: 

    mycsv = list(csv.reader(file1)) 

    words = [] 
    words_set = [] 
    tokenize_count = {} 
    for row in mycsv: 

     lst = tknzr.tokenize(row[2]) 
     for l in lst: 
      file2.write("\""+str(row[2])+"\""+","+"\""+str(l.encode('utf-8'))+"\""+"\n") 
      l = l.lower() 
      words.append(l) 
    words_set = list(set(words)) 
    print "len of words_set : " + str(len(words_set)) 
    for word in words_set: 
     tokenize_count[word] = 1 

    for word in words: 
     tokenize_count[word] = tokenize_count[word]+1 




    print "len of tokenized words_set : " + str(len(tokenize_count)) 

    #print "Tokenized_words count : " 
    #print tokenize_count 
    #print "=================================================================" 

    i = 0 
    for wrd in words_set: 
     #i = i+1 
     print "i : " +str(i) 
     file3.write("\""+str(i)+"\""+","+"\""+str(wrd.encode('utf-8'))+"\""+","+"\""+str(tokenize_count[wrd])+"\""+"\n") 

但在CSV我還是發現了像1,5,4,7一些重複值,9。

的方法的一些信息:

- dispn.csv = contains usernames of the users 
     which i am tokenizing with the help of nltk module 
    - after tokenizing them, i am storing these words in the list 'words' 
     and writing the words corresponding to the username to csv. 
    - creating set of it so as to get unique values out of list 'words' 
     and storing it in 'words_set' 
    - then creating dictionary 'tokenize_count' with key as word and 
     value as its frequency count and writing the same to csv. 

爲什麼會收到只有一些數值的重複?有沒有更好的方法來做同樣的事情?請幫忙。

+1

['進口從collections'計數器](https://開頭docs.python.org/2/library/collections.html#collections.Counter) –

+0

[如何計算列表中元素的頻率?](http://stackoverflow.com/questions/2161752/how這個元素在列表中的頻率) – alfasin

+0

@RNar:你可以發表你的評論作爲答案,以便我會接受它嗎?感謝它解決了我的問題 –

回答