刪除字符串中字符數超出範圍的所有字符

刪除超出範圍的所有字符：ordinal(128)來自python中的字符串的一種好方法是什麼？刪除字符串中字符數超出範圍的所有字符

我在python 2.7中使用hashlib.sha256。我發現了異常：

UnicodeEncodeError: 'ascii' codec can't encode character u'\u200e' in position 13: ordinal not in range(128)

我想這意味着，一些時髦的人物找到了進入我試圖散列的字符串。

謝謝！

來源

2012-06-06 Chris Dutrow

你應該只使用UTF-8，而不是ASCII – SLaks

這是一個處理unicode的錯誤方法的例子。 –

new_safe_str = some_string.encode('ascii','ignore')

我想會的工作

，或者你可以做一個列表理解

"".join([ch for ch in orig_string if ord(ch)<= 128])

[編輯]然而，正如其他人說，這可能是更好的弄清楚如何處理Unicode的中一般...除非你真的需要它編碼爲ascii出於某種原因

來源

2012-06-06 22:47:52

這是被接受的答案，因爲它是唯一能夠用於我的用例的答案。如果事先知道散列函數需要更多的微管理才能正確工作，但現在有數百萬個數據庫條目使用當前散列方法具有輔助鍵，我不能改變它。 –

這是一個例子，其中python3的變化將作出改善，或至少生成一個更清晰的電子郵件RROR消息

Python2

>>> import hashlib 
>>> funky_string=u"You owe me £100" 
>>> hashlib.sha256(funky_string) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 11: ordinal not in range(128) 
>>> hashlib.sha256(funky_string.encode("utf-8")).hexdigest() 
'81ebd729153b49aea50f4f510972441b350a802fea19d67da4792b025ab6e68e' 
>>>

Python3

>>> import hashlib 
>>> funky_string="You owe me £100" 
>>> hashlib.sha256(funky_string) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
TypeError: Unicode-objects must be encoded before hashing 
>>> hashlib.sha256(funky_string.encode("utf-8")).hexdigest() 
'81ebd729153b49aea50f4f510972441b350a802fea19d67da4792b025ab6e68e' 
>>>

真正的問題是，需要sha256字節序列，其python2不具有明確的概念。使用.encode("utf-8")就是我的建議。

來源

2012-06-06 23:04:15

而不是刪除這些字符，這將是更好地使用hashlib編碼不會嗆，UTF-8例如：

>>> data = u'\u200e' 
>>> hashlib.sha256(data.encode('utf-8')).hexdigest() 
'e76d0bc0e98b2ad56c38eebda51da277a591043c9bc3f5c5e42cd167abc7393e'

來源

2012-06-06 23:08:05

刪除字符串中字符數超出範圍的所有字符

回答

相關問題