2017-02-22 45 views
-2

我正在寫一個電影情緒分析,並在代碼中我得到一個錯誤,那就是:int()與基數爲10的無效字面值。代碼讀取一個單獨的文本文件,包含電影評論與其分數。 ex)4這部電影棒極了。謝謝您的幫助!編輯:出現在這裏的錯誤:行38得分= INT(lineSplits [0] .strip())int()與基數10的無效字面意味着什麼?

import re 
class WordStatistic: 
    def __init__(self, keyword, averageScore = 0, occurences = 0): 
     self.keyword = keyword 
     self.averageScore = averageScore 
     self.occurences = occurences 

    def getWord(self) : 
     return self.keyword 

    def getAverageScore(self) : 
     return self.averageScore 

    def getOccurences(self) : 
     return self.occurences 

    def addNewScore(self, newScore) : 
     oldScoreSum = self.averageScore * self.occurences 
     self.occurences = self.occurences + 1 
     self.averageScore = (oldScoreSum + newScore)/(self.occurences) 

    def printWordStatistic(self) : 
      print ("Word   : ", self.keyword) 
      print ("Occurences : ", self.occurences) 
      print ("Average Score : ", self.occurences, "\n\n") 
# "teaching" the code 
wordDictionary = {} 
fileInstance = open("movieReviews.txt",'r') 
fileText = fileInstance.read() 

# formatting and splitting 
reviewSplits = fileText.split("movieReviews") 
for review in reviewSplits : 
     review = review.strip() 
     if review == "" : 
      continue 
     lineSplits = review.split("\n") 
     score = int(lineSplits[0].strip()) 
     for i in range(1, len(lineSplits)) : 
      wordSplits = re.split("\t| ", lineSplits[i]) 
      for word in wordSplits : 
       if word == "" : 
        continue 
       # If it is already present, then update the score and count 
       # Otherwise just add the new entry to the dictionary 
       if wordDictionary in(word) : 
        wordStatistic = wordDictionary.get(word) 
        wordStatistic.addNewScore(score) 
       else : 
        wordStatistic = WordStatistic(word, score, 1) 
        wordDictionary[word] = wordStatistic 
# print the stats of the words 
def printAllWordStatistic(wordDictionary) : 
    for wordStatistic in wordDictionary.values() : 
     wordStatistic.printWordStatistic() 
# rating the actual review 
def calculateAverageOfReview(review) : 
    review.replace("\t", " ") 
    review.replace("\n", " ") 
    wordSplits = review.split(" ") 
    averageScore = 0.0 
    totalCount = 0; 
    for word in wordSplits : 
     if wordDictionary in (word) : 
      averageScore += wordDictionary.get(word).getAverageScore() 
      totalCount = totalCount + 1 
    if totalCount != 0 : 
     return averageScore/totalCount 
    return -1 
# getting user input and append multi lines of case of multi line review 
while (True) : 
    print ("\nEnter a review : "); 
    multiLines = [] 
    while True: 
     line = input() 
     if line: 
      multiLines.append(line) 
     else: 
      break 
    inputReview = '\n'.join(multiLines) 
    averageScore = calculateAverageOfReview(inputReview) 
    if averageScore != -1 : 
     if averageScore >= 2.50 : 
      print ("Positive Review"); 
     else : 
      print ("Negative Review"); 
    else : 
     print ("Unable to rate the review"); 
    if input("\nDo you want to continue ? (Y/N) : ") != "Y" : 
     print ("Quitting the session."); 
     exit() 
+2

這意味着字符串中的一個字符不在'1234567890。+ - '中。即字符出現在未出現在基數10算術中的字符串中。 –

+0

這是一個單句問題的很多代碼。 –

+1

您可以將問題縮小到一兩行。現在就做,然後呈現你的[MCVE]。 –

回答

1

這意味着int不知道如何處理那些不0-9字符做。如果你有你想拉一些出來的一些任意字符串,你可以使用正則表達式,所以不是:

score = int(lineSplits[0].strip()) 

喜歡的東西

score = int(re.search('[0-9]+', lineSplits[0]).group())) 

,將抓住數字的第一組。

相關問題