2013-08-07 108 views
0

我有一個文件,其中包含樂隊列表以及專輯的製作年份。 我需要編寫一個函數來查看這個文件,並找出這些樂隊的不同名稱,並計算出這些樂隊在這個文件中出現的次數。用某些詞彙打印每個短語/單詞的頻率?

文件的樣子是這樣的:

Beatles - Revolver (1966) 
Nirvana - Nevermind (1991) 
Beatles - Sgt Pepper's Lonely Hearts Club Band (1967) 
U2 - The Joshua Tree (1987) 
Beatles - The Beatles (1968) 
Beatles - Abbey Road (1969) 
Guns N' Roses - Appetite For Destruction (1987) 
Radiohead - Ok Computer (1997) 
Led Zeppelin - Led Zeppelin 4 (1971) 
U2 - Achtung Baby (1991) 
Pink Floyd - Dark Side Of The Moon (1973) 
Michael Jackson -Thriller (1982) 
Rolling Stones - Exile On Main Street (1972) 
Clash - London Calling (1979) 
U2 - All That You Can't Leave Behind (2000) 
Weezer - Pinkerton (1996) 
Radiohead - The Bends (1995) 
Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995) 
. 
. 
. 

輸出必須是在按頻率的降序,看起來像這樣:

band1: number1 
band2: number2 
band3: number3 

這裏是我到目前爲止的代碼:

def read_albums(filename) : 

    file = open("albums.txt", "r") 
    bands = {} 
    for line in file : 
     words = line.split() 
     for word in words: 
      if word in '-' : 
       del(words[words.index(word):]) 
     string1 = "" 
     for i in words : 
      list1 = [] 

      string1 = string1 + i + " " 
      list1.append(string1) 
     for k in list1 : 
      if (k in bands) : 
       bands[k] = bands[k] +1 
      else : 
       bands[k] = 1 


    for word in bands : 
     frequency = bands[word] 
     print(word + ":", len(bands)) 

我認爲有一個更簡單的方法來做到這一點,但我不確定。另外,我不確定如何按頻率對字典進行排序,是否需要將其轉換爲列表?

+1

查看['collections.Counter'](http://docs.python.org/2/library/collections.html#collections。計數器) –

回答

2

你說得對,還有一個更簡單的方法,用Counter

from collections import Counter 

with open('bandfile.txt') as f: 
    counts = Counter(line.split('-')[0].strip() for line in f if line) 

for band, count in counts.most_common(): 
    print("{0}:{1}".format(band, count)) 

究竟是什麼做的這樣:​​ if line

這條線是下面的循環的長型:

temp_list = [] 
for line in f: 
    if line: # this makes sure to skip blank lines 
     bits = line.split('-') 
     temp_list.add(bits[0].strip()) 

counts = Counter(temp_list) 

但是,與上面的循環 - 它不會創建一箇中介名單。相反,它會創建一個generator expression--更有效地解決問題的內存方式;它被用作Counter的參數。

+0

請注意'計數器'只適用於2.7及更高版本。如果你使用的東西比那更早,請查看這裏接受的答案:http://stackoverflow.com/questions/613183/python-sort-a-dictionary-by-value –

+0

我還是很新的python,那麼with語句做什麼?不在此代碼中,但總體而言。 –

+0

http://docs.python.org/2/reference/compound_stmts。html## –

1

如果您正在尋找簡潔,使用「defaultdict」和「分類」

from collections import defaultdict 
bands = defaultdict(int) 
with open('tmp.txt') as f: 
    for line in f.xreadlines(): 
     band = line.split(' - ')[0] 
     bands[band] += 1 
for band, count in sorted(bands.items(), key=lambda t: t[1], reverse=True): 
    print '%s: %d' % (band, count) 
+0

爲什麼要排序?該問題不要求排序輸出。請注意'collections.Counter()。most_common()'將會更加簡潔,因爲它會按照頻率爲您反向排序。 –

+0

正確;當我寫我的時候沒有看到Counter解決方案,那更好! – thierrybm

0

我的做法是使用split()方法將文件中的行打入成分標記列表。然後,你可以抓住樂隊的名字(在列表中第一個標記),並開始添加名稱字典來跟蹤計數:

import operator 

def main(): 
    f = open("albums.txt", "rU") 
    band_counts = {} 

    #build a dictionary that adds each band as it is listed, then increments the count for re-lists 
    for line in f: 
    line_items = line.split("-") #break up the line into individual tokens 
    band = line_items[0] 

    #don't want to add newlines to the band list 
    if band == "\n": 
    continue 

    if band in band_counts: 
    band_counts[band] += 1 #band already in the counts, increment the counts 
    else: 
    band_counts[band] = 1 #if the band was not already in counts, add it with a count of 1 

    #create a list of sorted results 
    sorted_list = sorted(band_counts.iteritems(), key=operator.itemgetter(1)) 

    for item in sorted_list: 
    print item[0], ":", item[1] 

注:

  1. 我跟着的建議這個答案創建排序結果:Sort a Python dictionary by value
  2. 如果您是Python的新手,請查看Google的Python類。當我剛剛開始時,我發現它非常有用:https://developers.google.com/edu/python/?csw=1