2013-03-20 74 views
0

這是我計算詞頻碼詞頻計數

word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay", "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"] 

arr_stop_kwd=["a","and"] 

frequencies = Hash.new(0) 
    word_arr.each { |word| 
     if !arr_stop_kwd.include?(word.downcase) && !word.match('&&') 
     frequencies["#{word.downcase}"] += 1 
     end 
    } 

當我有100K的數據將採取9.03秒,即,S來多少時間我可以計算出任何其它方式

THX提前

回答

2

看看Facets gem

你可以做這樣的事情使用frequency method

require 'facets' 
frequencies = (word_arr-arr_stop_kwd).frequency 

請注意,可以從word_arr中減去停用詞。參考Array Documentation

+0

先生我使用紅寶石1.8.7當我需要'facets'我發現一個錯誤堆棧級別太深我該如何解決這個 – 2013-03-20 11:06:49

+0

你需要安裝寶石。嘗試運行'gem install facets'或者添加'facets'到您的'.gemfile'如果你正在使用bundler – 2013-03-20 11:20:15

+0

我已經安裝了它們 – 2013-03-20 11:28:19