用某些詞彙打印每個短語/單詞的頻率？

我有一個文件，其中包含樂隊列表以及專輯的製作年份。我需要編寫一個函數來查看這個文件，並找出這些樂隊的不同名稱，並計算出這些樂隊在這個文件中出現的次數。用某些詞彙打印每個短語/單詞的頻率？

文件的樣子是這樣的：

Beatles - Revolver (1966) 
Nirvana - Nevermind (1991) 
Beatles - Sgt Pepper's Lonely Hearts Club Band (1967) 
U2 - The Joshua Tree (1987) 
Beatles - The Beatles (1968) 
Beatles - Abbey Road (1969) 
Guns N' Roses - Appetite For Destruction (1987) 
Radiohead - Ok Computer (1997) 
Led Zeppelin - Led Zeppelin 4 (1971) 
U2 - Achtung Baby (1991) 
Pink Floyd - Dark Side Of The Moon (1973) 
Michael Jackson -Thriller (1982) 
Rolling Stones - Exile On Main Street (1972) 
Clash - London Calling (1979) 
U2 - All That You Can't Leave Behind (2000) 
Weezer - Pinkerton (1996) 
Radiohead - The Bends (1995) 
Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995) 
. 
. 
.

輸出必須是在按頻率的降序，看起來像這樣：

band1: number1 
band2: number2 
band3: number3

這裏是我到目前爲止的代碼：

def read_albums(filename) : 

    file = open("albums.txt", "r") 
    bands = {} 
    for line in file : 
     words = line.split() 
     for word in words: 
      if word in '-' : 
       del(words[words.index(word):]) 
     string1 = "" 
     for i in words : 
      list1 = [] 

      string1 = string1 + i + " " 
      list1.append(string1) 
     for k in list1 : 
      if (k in bands) : 
       bands[k] = bands[k] +1 
      else : 
       bands[k] = 1 


    for word in bands : 
     frequency = bands[word] 
     print(word + ":", len(bands))

我認爲有一個更簡單的方法來做到這一點，但我不確定。另外，我不確定如何按頻率對字典進行排序，是否需要將其轉換爲列表？

來源

2013-08-07 Preston May

查看['collections.Counter']（http://docs.python.org/2/library/collections.html#collections。計數器） –

你說得對，還有一個更簡單的方法，用Counter：

from collections import Counter 

with open('bandfile.txt') as f: 
    counts = Counter(line.split('-')[0].strip() for line in f if line) 

for band, count in counts.most_common(): 
    print("{0}:{1}".format(band, count))

究竟是什麼做的這樣： if line？

這條線是下面的循環的長型：

temp_list = [] 
for line in f: 
    if line: # this makes sure to skip blank lines 
     bits = line.split('-') 
     temp_list.add(bits[0].strip()) 

counts = Counter(temp_list)

但是，與上面的循環 - 它不會創建一箇中介名單。相反，它會創建一個generator expression--更有效地解決問題的內存方式;它被用作Counter的參數。

來源

2013-08-07 16:39:01

請注意'計數器'只適用於2.7及更高版本。如果你使用的東西比那更早，請查看這裏接受的答案：http://stackoverflow.com/questions/613183/python-sort-a-dictionary-by-value –

我還是很新的python，那麼with語句做什麼？不在此代碼中，但總體而言。 –

http://docs.python.org/2/reference/compound_stmts。html＃＃ –

如果您正在尋找簡潔，使用「defaultdict」和「分類」

from collections import defaultdict 
bands = defaultdict(int) 
with open('tmp.txt') as f: 
    for line in f.xreadlines(): 
     band = line.split(' - ')[0] 
     bands[band] += 1 
for band, count in sorted(bands.items(), key=lambda t: t[1], reverse=True): 
    print '%s: %d' % (band, count)

來源

2013-08-07 16:42:59 thierrybm

爲什麼要排序？該問題不要求排序輸出。請注意'collections.Counter（）。most_common（）'將會更加簡潔，因爲它會按照頻率爲您反向排序。 –

正確;當我寫我的時候沒有看到Counter解決方案，那更好！ – thierrybm

我的做法是使用split()方法將文件中的行打入成分標記列表。然後，你可以抓住樂隊的名字（在列表中第一個標記），並開始添加名稱字典來跟蹤計數：

import operator 

def main(): 
    f = open("albums.txt", "rU") 
    band_counts = {} 

    #build a dictionary that adds each band as it is listed, then increments the count for re-lists 
    for line in f: 
    line_items = line.split("-") #break up the line into individual tokens 
    band = line_items[0] 

    #don't want to add newlines to the band list 
    if band == "\n": 
    continue 

    if band in band_counts: 
    band_counts[band] += 1 #band already in the counts, increment the counts 
    else: 
    band_counts[band] = 1 #if the band was not already in counts, add it with a count of 1 

    #create a list of sorted results 
    sorted_list = sorted(band_counts.iteritems(), key=operator.itemgetter(1)) 

    for item in sorted_list: 
    print item[0], ":", item[1]

注：

我跟着的建議這個答案創建排序結果：Sort a Python dictionary by value
如果您是Python的新手，請查看Google的Python類。當我剛剛開始時，我發現它非常有用：https://developers.google.com/edu/python/?csw=1

來源

2013-08-07 17:38:11 caffreyd

用某些詞彙打印每個短語/單詞的頻率？

回答

相關問題