我正在嘗試遵循Data Sci的課程介紹。但我遇到了一個問題,同時試圖解析來自twitter的json響應解析推文json中的新人錯誤UnicodeEncodeError:'charmap'編解碼器無法編碼位置13-63中的字符:字符映射到<undefined>
我想從以下格式的json中檢索文本。
{u'delete': {u'status': {u'user_id_str': u'702327198', u'user_id': 702327198, u'id': 332772178690981889L, u'id_str': u'332772178690981889'}}}, {u'delete': {u'status': {u'user_id_str': u'864736118', u'user_id': 864736118, u'id': 332770710667792384L, u'id_str': u'332770710667792384'}}}, {u'contributors': None, u'truncated': False, **u'text'**: u'RT @afgansyah_reza: Lagi ngantri. Ada ibu2 & temennya. "Ih dia mukanya mirip banget sama Afgan.", trus ngedeketin gw, "Tuh kan.. Mirip bang\u2026', u'in_reply_to_status_id': None, u'id': 332772350640668672L, u'favorite_count': 0, ....... ]
這裏是我使用它的代碼:
def hw():
data = []
count=0
with open('output.txt') as f:
for line in f:
encoded_string = line.strip().encode('utf-8')
data.append(json.loads(encoded_string))
print data# generates the input to next block
for listval in data:#individual block
if "text" in listval:
print listval["text"]
else:
continue
不過,我得到以下輸出和錯誤,當我運行它
RT @afgansyah_reza: Lagi ngantri. Ada ibu2 & temennya. "Ih dia mukanya mirip banget sama Afgan.", trus ngedeketin gw, "Tuh kan.. Mirip bang…
RT @Dimaz_CSIX: Kolor pakek pita #laguharlemshake
Traceback (most recent call last):
File "F:\ProgrammingPoint\workspace-new\PyTest\tweet_sentiment.py", line 41, in <module>
main()
File "F:\ProgrammingPoint\workspace-new\PyTest\tweet_sentiment.py", line 36, in main
hw()
File "F:\ProgrammingPoint\workspace-new\PyTest\tweet_sentiment.py", line 23, in hw
print listval["text"]
File "C:\Python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 13-63: character maps to <undefined>
我是一個新人,以Python和任何幫助,將不勝感激。
順便說一句,學習Python 3.3而不是2.7的好處之一就是這個東西要容易得多。 (3.x強迫你比2.x早得多處理Unicode,但是因爲你已經用2.x運行了它,這不是什麼壞處。) – abarnert 2013-05-11 00:22:36