2015-06-06 66 views
0

我無法在語音文本文件中找到唯一字的數量(實際上是3個文件),我只是想給你我的完整代碼沒有誤解。用我的兩個函數進行文本分析時遇到問題

#This program will serve to analyze text files for the number of words in 
#the text file, number of characters, sentances, unique words, and the longest 
#word in the text file. This program will also provide the frequency of unique 
#words. In particular, the text will be three political speeches which we will 
#analyze, building on searching techniques in Python. 

def main(): 
    harper = readFile("Harper's Speech.txt") 
    newWords = cleanUpWords(harper) 
    print(numCharacters(harper), "Characters.") 
    print(numSentances(harper), "Sentances.") 
    print(numWords(newWords), "Words.") 
    print(uniqueWords(newWords), "Unique Words.") 
    print("The longest word is: ", longestWord(newWords)) 
    obama1 = readFile("Obama's 2009 Speech.txt") 
    newWords = cleanUpWords(obama1) 
    print(numCharacters(obama1), "Characters.") 
    print(numSentances(obama1), "Sentances.") 
    print(numWords(obama1), "Words.") 
    print(uniqueWords(newWords), "Unique Words.") 
    print("The longest word is: ", longestWord(newWords)) 
    obama2 = readFile("Obama's 2008 Speech.txt") 
    newWords = cleanUpWords(obama2) 
    print(numCharacters(obama2), "Characters.") 
    print(numSentances(obama2), "Sentances.") 
    print(numWords(obama2), "Words.") 
    print(uniqueWords(newWords), "Unique Words.") 
    print("The longest word is: ", longestWord(newWords)) 

def readFile(filename): 
    '''Function that reads a text file, then prints the name of file without 
'.txt'. The fuction returns the read file for main() to call, and print's 
the file's name so the user knows which file is read''' 
    inFile1 = open(filename, "r") 
    fileContentsList = inFile1.read() 
    inFile1.close() 
    print("\n", filename.replace(".txt", "") + ":") 
    return fileContentsList 

def numCharacters(file): 
    '''Fucntion returns the length of the READ file (not readlines because it 
would only read the amount of lines and counting characters would be wrong), 
which will be the correct amount of total characters in the text file.''' 
    return len(file) 

def numSentances(file): 
    '''Function returns the occurances of a period, exclamation point, or 
a question mark, thus counting the amount of full sentances in the text file.''' 
    return file.count(".") + file.count("!") + file.count("?") 

def cleanUpWords(file): 
     words = (file.replace("-", " ").replace(" ", " ").replace("\n", " ")) 
     onlyAlpha = "" 
     for i in words: 
      if i.isalpha() or i == " ": 
       onlyAlpha += i 
     return onlyAlpha.replace(" ", " ") 

def numWords(newWords): 
    '''Function finds the amount of words in the text file by returning 
the length of the cleaned up version of words from cleanUpWords().''' 
    return len(newWords.split()) 

def uniqueWords(newWords): 
    unique = sorted(newWords.split()) 
    unique = set(unique) 
    return str(len(unique)) 

def longestWord(file): 
    max(file.split()) 

main() 

所以,我最後的兩個功能uniqueWords,並longestWord將不能正常工作,或者至少我的輸出是錯誤的。對於獨特的詞,我應該得到527,但我實際上得到567有些奇怪的原因。另外,無論我做什麼,我最長的單詞功能總是不打印。我已經嘗試了很多方法來獲得最長的單詞,以上只是其中一種方法,但都不會返回。請幫助我處理我的兩個傷心功能!

回答

0

嘗試做這樣:

def longestWord(file): 
    return sorted(file.split(), key = len)[-1] 

或者,它會更容易在

def uniqueWords(newWords): 
    unique = set(newWords.split()) 
    return (str(len(unique)),max(unique, key=len)) 

info = uniqueWords("My name is Harper") 
print("Unique words" + info[0]) 
print("Longest word" + info[1]) 

做的,你不需要sortedset之前得到的所有獨特的詞 因爲它是一個Unordered collections of unique elements

並看看cleanUpWords。因爲如果你會有這樣的字符串Hello I'm Harper. Harper I am.

清理後,你會得到6個獨特的詞,因爲你會有文字Im

+0

我想你給了我最長的獨特單詞的代碼,但對於獨特的單詞,我只是想讓它返回有多少獨特的單詞。我仍然對我獨特的單詞功能感到困惑 – BBEng