Python腳本接收UnicodeEncodeError：「ASCII」編解碼器不能編碼字符

我有一個從reddit的拉職位和他們在Twitter上簡單的Python腳本。不幸的是，今晚它開始出現我所假設的問題，因爲某人在reddit上的標題有格式問題。那我reciving的錯誤是：Python腳本接收UnicodeEncodeError：「ASCII」編解碼器不能編碼字符

File "redditbot.py", line 82, in <module> 
    main() 
File "redditbot.py", line 64, in main 
tweeter(post_dict, post_ids) 
File "redditbot.py", line 74, in tweeter 
print post+" "+post_dict[post]+" #python" 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 34: ordinal not in range(128)

這裏是我的腳本：

# encoding=utf8 
import praw 
import json 
import requests 
import tweepy 
import time 
import urllib2 
import sys 
reload(sys) 
sys.setdefaultencoding('utf8') 

access_token = 'hidden' 
access_token_secret = 'hidden' 
consumer_key = 'hidden' 
consumer_secret = 'hidden' 


def strip_title(title): 
    if len(title) < 75: 
    return title 
else: 
    return title[:74] + "..." 

def tweet_creator(subreddit_info): 
post_dict = {} 
post_ids = [] 
print "[bot] Getting posts from Reddit" 
for submission in subreddit_info.get_hot(limit=2000): 
    post_dict[strip_title(submission.title)] = submission.url 
    post_ids.append(submission.id) 
print "[bot] Generating short link using goo.gl" 
mini_post_dict = {} 
for post in post_dict: 
    post_title = post 
    post_link = post_dict[post] 

    mini_post_dict[post_title] = post_link 
return mini_post_dict, post_ids 

def setup_connection_reddit(subreddit): 
print "[bot] setting up connection with Reddit" 
r = praw.Reddit('PythonReddit PyReTw' 
      'monitoring %s' %(subreddit)) 
subreddit = r.get_subreddit('python') 
return subreddit 



def duplicate_check(id): 
found = 0 
with open('posted_posts.txt', 'r') as file: 
    for line in file: 
     if id in line: 
      found = 1 
return found 

def add_id_to_file(id): 
with open('posted_posts.txt', 'a') as file: 
    file.write(str(id) + "\n") 

def main(): 
subreddit = setup_connection_reddit('python') 
post_dict, post_ids = tweet_creator(subreddit) 
tweeter(post_dict, post_ids) 

def tweeter(post_dict, post_ids): 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret) 
api = tweepy.API(auth) 
for post, post_id in zip(post_dict, post_ids): 
    found = duplicate_check(post_id) 
    if found == 0: 
     print "[bot] Posting this link on twitter" 
     print post+" "+post_dict[post]+" #python" 
     api.update_status(post+" "+post_dict[post]+" #python") 
     add_id_to_file(post_id) 
     time.sleep(3000) 
    else: 
     print "[bot] Already posted" 

if __name__ == '__main__': 
main()

任何幫助將是非常讚賞 - 在此先感謝！

來源

2016-01-17 Arbaxas

你介意修理你的例子的縮進：例如，格式和打印字節之前編碼post明確？ – karlson

你可能會覺得這篇文章有用：[Pragmatic Unicode]（http://nedbatchelder.com/text/unipain.html），這是SO老將Ned Batchelder寫的。 –

問題可能源自於串聯混合字節串和unicode字符串。作爲在u前綴所有字符串文字的替代方法，可能爲

from __future__ import unicode_literals

爲您修復了一些事情。請參閱here以獲得更深入的解釋，並決定它是否適合您。

來源

2016-01-17 10:58:43 karlson

你要打印unicode字符串到終端（或者可能是通過IO重定向文件），但您的終端（或文件系統）中使用的編碼是ASCII。由於Python試圖將其從unicode表示轉換爲ASCII，但因爲代碼點u'\u201c'（「）無法用ASCII表示，所以它失敗。有效地你的代碼是這樣做的：

>>> print u'\u201c'.encode('ascii') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)

你可以嘗試轉換爲UTF-8：

print (post + " " + post_dict[post] + " #python").encode('utf8')

或轉換爲ASCII這樣的：

print (post + " " + post_dict[post] + " #python").encode('ascii', 'replace')

將取代無效的ASCII字符與?。

另一種方式，如果要打印的調試的目的是有用的，是打印字符串的repr：

print repr(post + " " + post_dict[post] + " #python")

這將輸出是這樣的：

>>> s = 'string with \u201cLEFT DOUBLE QUOTATION MARK\u201c' 
>>> print repr(s) 
u'string with \u201cLEFT DOUBLE QUOTATION MARK\u201c'

來源

2016-01-17 11:00:48 mhawke

考慮這個簡單的程序：

print(u'\u201c' + "python")

如果您嘗試打印到終端L（用適當的字符編碼），你會得到

「python

但是，如果你試圖輸出重定向到一個文件，你會得到一個UnicodeEncodeError。

script.py > /tmp/out 
Traceback (most recent call last): 
    File "/home/unutbu/pybin/script.py", line 4, in <module> 
    print(u'\u201c' + "python") 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)

當您打印到終端時，Python使用終端的字符編碼來編碼unicode。（終端只能打印字節，所以unicode的必須按順序進行編碼，以進行打印。）

當重定向輸出到文件，Python不能確定字符編碼，因爲文件沒有聲明編碼。因此默認情況下，Python2在寫入文件之前使用ascii編碼隱式編碼所有unicode。由於u'\u201c'不能被ascii編碼，所以UnicodeEncodeError。（只有前127個unicode代碼點可以用ascii編碼）。

此問題在Why Print Fails wiki中有詳細說明。

要解決這個問題，首先要避免添加unicode和字節字符串。這會導致使用Python2中的ascii編解碼器進行隱式轉換，以及Python3中的異常。爲了將來能夠驗證你的代碼，最好是明確的。

post = post.encode('utf-8') 
print('{} {} #python'.format(post, post_dict[post]))

來源

2016-01-17 11:14:08 unutbu

Python腳本接收UnicodeEncodeError：「ASCII」編解碼器不能編碼字符

回答

相關問題