2016-12-09 46 views
2

我正在將文本文件作爲輸入並創建一個函數來計算哪個單詞最頻繁出現。如果兩個或更多的單詞出現頻率最高且相等,我將打印所有這些單詞。TypeError:不可用類型:'list' - 創建頻率函數

def wordOccurance(userFile): 

    userFile.seek(0) 
    line = userFile.readline() 
    lines = [] 
    while line != "": 
     if line != "\n": 
      line = line.lower() # making lower case 
      line = line.rstrip("\n") # cleaning 
      line = line.rstrip("?") #cleans the whole docoument by removing "?" 
      line = line.rstrip("!") #cleans the whole docoument by removing "!" 
      line = line.rstrip(".") #cleans the whole docoument by removing "." 
      line = line.split(" ") #splits the texts into space 
      lines.append(line) 
     line = userFile.readline() # keep reading lines from document. 

    words = lines 

    wordDict = {} #creates the clean word Dic, from above 
    for word in words: # 
     if word in wordDict.keys(): 
      wordDict[word] = wordDict[word] + 1 
     else: 
      wordDict[word] = 1 

    largest_value = max(wordDict.values()) 

    for k in wordDict.keys(): 
     if wordDict[k] == largest_value: 
      print(k) 

    return wordDict 

請幫助我使用此功能。

+0

哪一行產生錯誤?在某些時候(可能是'wordDict [word] = 1'),您正嘗試使用列表作爲字典鍵,這是不允許的。 – elethan

+0

這一行給我的錯誤消息:如果在wordDict.keys()中的單詞: –

+0

我想不出任何方式可以從該行中得到該錯誤。我發佈的解決方案適合您嗎?如果不是的話,你能否在你的問題中發佈你錯誤的完整回溯,以便我能更好地幫助你? – elethan

回答

0

在這一行要創建一個字符串列表:

line = line.split(" ") #splits the texts into space 

然後你把它添加到列表,讓你有一個列表的列表:

lines.append(line) 

後來你循環通過列表並嘗試使用子列表作爲關鍵字:

for word in words: # 
    if word in wordDict.keys(): 
     wordDict[word] = wordDict[word] + 1 
    else: 
     wordDict[word] = 1 # Here you will try to assign a list (`word`) as a key, which is not allowed 

一個簡單的修復方法是將單元列表弄平TS第一:

words = [item for sublist in lines for item in sublist] 

for word in words: # 
    if word in wordDict.keys(): 
     wordDict[word] = wordDict[word] + 1 
    else: 
     wordDict[word] = 1 

通過lineslist comprehension[item for sublist in lines for item in sublist]將循環,然後循環通過line.split(" ")創建的子表,並返回由每個子列表中的項目的一個新的列表。對你來說,lines可能看起來是這樣的:

[['words', 'on', 'line', 'one'], ['words', 'on', 'line', 'two']] 

列表解析會變成這樣:

['words', 'on', 'line', 'one', 'words', 'on', 'line', 'two'] 

如果你想使用一些不那麼複雜,你可以只使用嵌套循環:

# words = lines 
    # just use `lines` in your for loop instead of creating an identical list 

    wordDict = {} #creates the clean word Dic, from above 
    for line in lines: 
     for word in line: 
      if word in wordDict.keys(): 
       wordDict[word] = wordDict[word] + 1 
      else: 
       wordDict[word] = 1 

    largest_value = max(wordDict.values()) 

這可能會有點低效率和/或「Pythonic」,但它可能會更容易纏繞你的頭。

另外,您可能需要考慮在清理數據之前將每行分割成單詞,因爲如果先清理這些行,則只會在行尾而不是字尾處刪除標點符號。但是,根據數據的性質,這可能不是必需的。

相關問題