2015-06-07 53 views
1

[使用Python 3.3.3]唯一字保存到文本文件作爲每行字

我試圖分析文本文件,清理它們,打印的獨特單詞量,然後嘗試保存將獨特單詞列表添加到文本文件中,每行一個單詞,每個獨特單詞出現在已清理的單詞列表中。 所以我做的是我拿了文本文件(總理哈珀的演講),通過只計算有效的字母字符和單個空格來清理它,然後我計算了唯一字的數量,然後我需要創建一個保存的文本文件的唯一字,每個獨特的單詞在它自己的行和旁邊的單詞,在清理列表中該單詞的出現次數。這是我的。

def uniqueFrequency(newWords): 
    '''Function returns a list of unique words with amount of occurances of that 
word in the text file.''' 
    unique = sorted(set(newWords.split())) 
    for i in unique: 
     unique = str(unique) + i + " " + str(newWords.count(i)) + "\n" 
    return unique 

def saveUniqueList(uniqueLines, filename): 
    '''Function saves result of uniqueFrequency into a text file.''' 
    outFile = open(filename, "w") 
    outFile.write(uniqueLines) 
    outFile.close 

newWords是文本文件的清理版本,只有文字和空格,沒有別的。因此,我希望newWords文件中的每個獨特單詞保存到文本文件中,每行一個單詞,並且在單詞旁邊,在newWords中有該單詞的發生次數(不在唯一單詞列表中,因爲每個單詞都會有1個發生)。我的功能有什麼問題?謝謝!基於

unique = sorted(set(newWords.split())) 
for i in unique: 
    unique = str(unique) + i + " " + str(newWords.count(i)) + "\n" 

我猜測newWords不是字符串列表,但長長的一串

+0

你怎麼知道它不工作? –

回答

2
unique = str(unique) + i + " " + str(newWords.count(i)) + "\n" 

上面的線,被附加在現有集的結尾 - 「獨一無二的」,如果你使用一些其他的變量名來代替,如「無功」,這應該正確返回。

def uniqueFrequency(newWords): 
    '''Function returns a list of unique words with amount of occurances of that 
word in the text file.''' 
    var = ""; 
    unique = sorted(set(newWords.split())) 
    for i in unique: 
     var = str(var) + i + " " + str(newWords.count(i)) + "\n" 
    return var 
+0

有很好的捕獲。 –

+0

這幫助,不能相信我沒有看到。謝謝!!! – BBEng

1

。如果是這種情況,newWords.count(i)將返回0i

嘗試:

def uniqueFrequency(newWords): 
    '''Function returns a list of unique words with amount of occurances of that 
word in the text file.''' 
    wordList = newWords.split() 
    unique = sorted(set(wordList)) 
    ret = "" 
    for i in unique: 
     ret = ret + i + " " + str(wordList.count(i)) + "\n" 
    return ret 
+0

這個答案也很棒。這讓我瞭解了這個詞的正確數量,所以謝謝! – BBEng

0

嘗試用collections.Counter代替。它適用於這種情況。

示範下面IPython

In [1]: from collections import Counter 

In [2]: txt = """I'm trying to analyse text files, clean them up, print the amount of unique words, then try to save the unique words list to a text file, one word per line with the amount of times each unique word appears in the cleaned up list of words. SO what i did was i took the text file (a speech from prime minister harper), cleaned it up by only counting valid alphabetical characters and single spaces, then i counted the amount of unique words, then i needed to make a saved text file of the unique words, with each unique word being on its own line and beside the word, the number of occurances of that word in the cleaned up list. Here's what i have.""" 

In [3]: Counter(txt.split()) 
Out[3]: Counter({'the': 10, 'of': 7, 'unique': 6, 'i': 5, 'to': 4, 'text': 4, 'word': 4, 'then': 3, 'cleaned': 3, 'up': 3, 'amount': 3, 'words,': 3, 'a': 2, 'with': 2, 'file': 2, 'in': 2, 'line': 2, 'list': 2, 'and': 2, 'each': 2, 'what': 2, 'did': 1, 'took': 1, 'from': 1, 'words.': 1, '(a': 1, 'only': 1, 'harper),': 1, 'was': 1, 'analyse': 1, 'one': 1, 'number': 1, 'them': 1, 'appears': 1, 'it': 1, 'have.': 1, 'characters': 1, 'counted': 1, 'list.': 1, 'its': 1, "I'm": 1, 'own': 1, 'by': 1, 'save': 1, 'spaces,': 1, 'being': 1, 'clean': 1, 'occurances': 1, 'alphabetical': 1, 'files,': 1, 'counting': 1, 'needed': 1, 'that': 1, 'make': 1, "Here's": 1, 'times': 1, 'print': 1, 'up,': 1, 'beside': 1, 'trying': 1, 'on': 1, 'try': 1, 'valid': 1, 'per': 1, 'minister': 1, 'file,': 1, 'saved': 1, 'single': 1, 'words': 1, 'SO': 1, 'prime': 1, 'speech': 1, 'word,': 1}) 

(請注意,此解決方案還不完善,還沒有從字眼去掉逗號提示;使用str.replace。)

Counter是一個專門的dict,用一個詞作爲關鍵字,並將計數作爲值。所以你可以這樣使用它:

cnts = Counter(txt) 
with open('counts.txt', 'w') as outfile: 
    for c in counts: 
     outfile.write("{} {}\n".format(c, cnts[c])) 

請注意,在這個解決方案中,我使用了一些很好的Python概念;

相關問題