2014-05-23 121 views
9

我是新來的Python和Stackoverflow(請溫和),並且正在嘗試學習如何進行情感分析。我使用的代碼的組合我在一個教程,並在這裏找到:Python - AttributeError: 'list' object has no attribute不過,我不斷收到Python文本處理:AttributeError:'list'對象沒有屬性'lower'

Traceback (most recent call last): 
    File "C:/Python27/training", line 111, in <module> 
    processedTestTweet = processTweet(row) 
    File "C:/Python27/training", line 19, in processTweet 
    tweet = tweet.lower() 
AttributeError: 'list' object has no attribute 'lower'` 

這是我的代碼:

import csv 
#import regex 
import re 
import pprint 
import nltk.classify 


#start replaceTwoOrMore 
def replaceTwoOrMore(s): 
    #look for 2 or more repetitions of character 
    pattern = re.compile(r"(.)\1{1,}", re.DOTALL) 
    return pattern.sub(r"\1\1", s) 

# process the tweets 
def processTweet(tweet): 
    #Convert to lower case 
    tweet = tweet.lower() 
    #Convert www.* or https?://* to URL 
    tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',tweet) 
    #Convert @username to AT_USER 
    tweet = re.sub('@[^\s]+','AT_USER',tweet) 
    #Remove additional white spaces 
    tweet = re.sub('[\s]+', ' ', tweet) 
    #Replace #word with word 
    tweet = re.sub(r'#([^\s]+)', r'\1', tweet) 
    #trim 
    tweet = tweet.strip('\'"') 
    return tweet 

#start getStopWordList 
def getStopWordList(stopWordListFileName): 
    #read the stopwords file and build a list 
    stopWords = [] 
    stopWords.append('AT_USER') 
    stopWords.append('URL') 

    fp = open(stopWordListFileName, 'r') 
    line = fp.readline() 
    while line: 
     word = line.strip() 
     stopWords.append(word) 
     line = fp.readline() 
    fp.close() 
    return stopWords 

def getFeatureVector(tweet, stopWords): 
    featureVector = [] 
    words = tweet.split() 
    for w in words: 
     #replace two or more with two occurrences 
     w = replaceTwoOrMore(w) 
     #strip punctuation 
     w = w.strip('\'"?,.') 
     #check if it consists of only words 
     val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*[a-zA-Z]+[a-zA-Z0-9]*$", w) 
     #ignore if it is a stopWord 
     if(w in stopWords or val is None): 
      continue 
     else: 
      featureVector.append(w.lower()) 
    return featureVector 

def extract_features(tweet): 
    tweet_words = set(tweet) 
    features = {} 
    for word in featureList: 
     features['contains(%s)' % word] = (word in tweet_words) 
    return features 


#Read the tweets one by one and process it 
inpTweets = csv.reader(open('C:/GsTraining.csv', 'rb'), 
         delimiter=',', 
         quotechar='|') 
stopWords = getStopWordList('C:/stop.txt') 
count = 0; 
featureList = [] 
tweets = [] 

for row in inpTweets: 
    sentiment = row[0] 
    tweet = row[1] 
    processedTweet = processTweet(tweet) 
    featureVector = getFeatureVector(processedTweet, stopWords) 
    featureList.extend(featureVector) 
    tweets.append((featureVector, sentiment)) 

# Remove featureList duplicates 
featureList = list(set(featureList)) 

# Generate the training set 
training_set = nltk.classify.util.apply_features(extract_features, tweets) 

# Train the Naive Bayes classifier 
NBClassifier = nltk.NaiveBayesClassifier.train(training_set) 

# Test the classifier 
with open('C:/CleanedNewGSMain.txt', 'r') as csvinput: 
    with open('GSnewmain.csv', 'w') as csvoutput: 
    writer = csv.writer(csvoutput, lineterminator='\n') 
    reader = csv.reader(csvinput) 

    all=[] 
    row = next(reader) 

    for row in reader: 
     processedTestTweet = processTweet(row) 
     sentiment = NBClassifier.classify(
      extract_features(getFeatureVector(processedTestTweet, stopWords))) 
     row.append(sentiment) 
     processTweet(row[1]) 

    writer.writerows(all) 

任何幫助將大規模讚賞。

回答

8

csv閱讀器的結果是一個列表,lower只適用於字符串。推測這是一個字符串列表,所以有兩個選項。您可以在每個元素上調用lower,或者將列表變成一個字符串,然後在其上調用lower

# the first approach 
[item.lower() for item in tweet] 

# the second approach 
' '.join(tweet).lower() 

但更合理的(很難說沒有更多的信息)你實際上只需要一個項目列表。沿着線的東西:

for row in reader: 
    processedTestTweet = processTweet(row[0]) # Again, can't know if this is actually correct without seeing the file 

此外,猜你沒有使用CSV讀者很喜歡你以爲你是,因爲現在你是在一個簡單的例子,每次訓練樸素貝葉斯分類,然後讓它預測它被訓練的一個例子。也許可以解釋你想要做什麼?

+0

感謝您的快速響應。我正在嘗試的是:我有一個標有.csv的小型訓練集,包含1000個正面和1000個負面語句。訓練似乎是正常的,因爲我在測試陳述中僅通過硬編碼進行測試,例如, ' 那很棒!'。但是,我有一個包含約10000條推文和Facebook帖子的文件,我想在此程序中打開該文件,並使用樸素貝葉斯測試它的情感。我不認爲我正確使用csv閱讀器,但我無法將它放在手指上。 – user3670554

相關問題