從文本文件中讀取並將文字頻率保存到新文本文件中，然後在新行上打印每個文件

-1

美好的一天。請幫助。使用的語言是python。下面的代碼從一個文本文件中讀取，然後將每個單詞的頻率返回到新行中。我是從這個網站https://rmtheis.wordpress.com/2012/09/26/count-word-frequency-with-python/從文本文件中讀取並將文字頻率保存到新文本文件中，然後在新行上打印每個文件

import re 
from collections import Counter 


def openfile(filename): 
    fh = open(filename, "r+") 
    str = fh.read() 
    fh.close() 
    return str 


def removegarbage(str): 
    # Replace one or more non-word (non-alphanumeric) chars with a space 
    str = re.sub(r'\W+', ' ', str) 
    str = str.lower() 
    return str 


def getwordbins(words): 
    cnt = Counter() 
    for word in words: 
     cnt[word] += 1 
    return cnt 


def main(filename, topwords): 
    txt = openfile(filename) 
    txt = removegarbage(txt) 
    words = txt.split(' ') 
    bins = getwordbins(words) 
    for key, value in bins.most_common(topwords): 

     print(key, value) 

main('hamlet.txt', 500)

從上面的，它打印精美的IDE我使用（pyCharm）。但是，當我補充一點，上面的代碼下面的下面的代碼，

#Write to file 
    with open("newFile.txt", "w") as f: 
     for word in main('hamlet.txt', 500): 
      f.write(word + os.linesep)

它打印在控制檯不錯，但顯示了一些錯誤，也是它沒寫文本文件我創建在所有。。下面是顯示在控制檯上的示例輸出一個片段閱讀的文本文件後，它打印：

the 16 
of 12 
to 9 
search 9 
which 6

所以，現在，我希望寫這上面的輸出到紡織的文本文件。該內容要長得多比以上。謝謝。順便說一句，誤差在控制檯上得到的是

Traceback (most recent call last): 
    File "/Users/test/PycharmProjects/Trial/trial.py", line 52, in <module> 
    for word in main("hamlet.txt", 500): 
TypeError: 'NoneType' object is not iterable

來源

2017-02-25 user3761841

，如果你想使用的功能main如圖所示，即

for word in main('hamlet.txt', 500):

那麼函數應適應這一點。人們可以使用例如發電機：

def main(filename, topwords): 
    txt = openfile(filename) 
    txt = removegarbage(txt) 
    words = txt.split(' ') 
    bins = getwordbins(words) 
    for key, value in bins.most_common(topwords): 
     # yield key #generate only the word, not it's frequency 
     yield key, value 

with open("newFile.txt", "w") as f: 
    for word, freq in main('hamlet.txt', 500): 
     f.write('%s\t%d\n' % (word, freq))

來源

2017-02-25 08:27:02 ewcz

謝謝你的答覆。我嘗試了你的建議。它確實節省了紡織品，但沒有顯示頻率。但就像我之前說過的，結果必須保存在文本文件中，並且也保存在頻率中。就像我上面顯示的示例輸出一樣。所以請你怎麼能幫助我呢？再次感謝... – user3761841

@ user3761841在這種情況下，發生器可以產生兩個值。這些被寫入輸出文件。我已經相應地更新了答案。 – ewcz

'那就是訣竅....哇，非常感謝。 Python似乎有點奇怪，雖然..我來自Java背景... Thankssss很多..' – user3761841

您需要return key, value，而不是打印出來

來源

2017-02-25 08:28:28

謝謝你的迴應。我跟着你的指示，我沒有改變任何東西，而不是從（返回鍵，價值）到（返回鍵，值），但我似乎甚至沒有運行在這一次。它顯示一個錯誤。 – user3761841

有關TypeError的一些信息：不支持的操作數類型爲+：'int'和'str' – user3761841

我按照您的指示進行了嘗試。它保存到紡織品中，但只保存了一個字，並且沒有顯示頻率。我現在可以做什麼？謝謝。 – user3761841

從文本文件中讀取並將文字頻率保存到新文本文件中，然後在新行上打印每個文件

回答

相關問題