2017-09-15 65 views
1

使用nltk對項目進行情緒分析。我搜索了GH,發現sentimaent_analyser或popular_scores調用沒有任何相似之處。sentiment_analyser錯誤:'字節'對象沒有屬性'編碼'使用

我也看了Python 3.4 - 'bytes' object has no attribute 'encode',它不是重複的,因爲我沒有調用bcrypt.gensalt()。encode('utf-8')。雖然它暗示了某種錯誤類型的問題。

任何人都可以幫助解決這個錯誤?

我得到的錯誤:

/lib/python3.5/site-packages/nltk/sentiment/vader.py in init(self, text) 154 def init(self, text): 155 if not isinstance(text, str): --> 156 text = str(text.encode('utf-8')) 157 self.text = text 158 self.words_and_emoticons = self._words_and_emoticons()

AttributeError: 'bytes' object has no attribute 'encode'

數據幀df_stocks.head(5):

  prices articles 
2007-01-01 12469 What Sticks from '06. Somalia Orders Islamist... 
2007-01-02 12472 Heart Health: Vitamin Does Not Prevent Death ... 
2007-01-03 12474 Google Answer to Filling Jobs Is an Algorithm... 
2007-01-04 12480 Helping Make the Shift From Combat to Commerc... 
2007-01-05 12398 Rise in Ethanol Raises Concerns About Corn as...     

的代碼下面的最後一行的錯誤發生的歷史:

import numpy as np 
import pandas as pd 
from nltk.classify import NaiveBayesClassifier 
from nltk.corpus import subjectivity 
from nltk.sentiment import SentimentAnalyzer 
from nltk.sentiment.util import *from nltk.sentiment.vader import  SentimentIntensityAnalyzer 
import unicodedata 
for date, row in df_stocks.T.iteritems(): 
    sentence = unicodedata.normalize('NFKD', df_stocks.loc[date, 'articles']).encode('ascii','ignore') 
    ss = sid.polarity_scores(sentence) 

謝謝

+0

可能的重複 - https://stackoverflow.com/questions/38246412/python-3-4-bytes-object-has-no-attribute-encode –

+0

[Python 3.4 - 'bytes'對象可能的重複沒有任何屬性'encode'](https://stackoverflow.com/questions/38246412/python-3-4-bytes-object-has-no-attribute-encode) – eyllanesc

+0

似乎'df_stocks.loc [date,'articles']'不是unicode str,df_stocks是什麼? – aircraft

回答

1

unicodedata.normalize() docs開始,該方法將UNICODE字符串轉換爲通用格式字符串。

import unicodedata 

print(unicodedata.normalize('NFKD', u'abcdあäasc').encode('ascii', 'ignore')) 

將得到:

b'abcdaasc' 

所以,問題就在這裏:df_stocks.loc[date, 'articles']不是一個Unicode字符串。

+0

是的你知道你是正確的......這是在Python 3類型str ..所以工作映射到Unicode現在......我剛剛意識到代碼是一個端口從蟒蛇2這可能導致了這個錯誤 – Mike

+0

很高興幫助你 – aircraft

相關問題