排除python中的非ASCII字符

我有一個使用字典解密加密消息的腳本，問題是解密過程產生大量垃圾（a.k.a非ascii）字符。這裏是我的代碼：排除python中的非ASCII字符

from Crypto.Cipher import AES 
import base64 
import os 

BLOCK_SIZE = 32 

PADDING = '{' 

# Encrypted text to decrypt 
encrypted = "WI4wBGwWWNcxEovAe3p+GrpK1GRRQcwckVXypYlvdHs=" 

DecodeAES = lambda c, e: c.decrypt(base64.b64decode(e)).rstrip(PADDING) 

adib = open('words.txt') 
for line in adib.readlines(): 
    secret = line.rstrip('\n') 
    if (secret[-1:] == "\n"): 
     print "Error, new line character at the end of the string. This will not match!" 
    elif (len(secret) >= 32): 
     print "Error, string too long. Must be less than 32 characters." 
    else: 
     # create a cipher object using the secret 
     cipher = AES.new(secret + (BLOCK_SIZE - len(secret) % BLOCK_SIZE) * PADDING) 

     # decode the encoded string 
     decoded = DecodeAES(cipher, encrypted) 
     print decoded+"\n"

什麼，我已經想到了迄今爲止被轉換decoded字符串轉換爲ASCII然後排除非ASCII字符，但沒有奏效。

來源

2016-03-10 shoomy

你能準確的一個「words.txt」文件內容例如請 –

它包含了常用詞，但這裏有一些話 – shoomy

'的和一個一塊集章盜弧的編輯卷他插槽名島是路飛是爲與部分世界類別特別漫畫維基維基百科全書是日本這動漫 SBS 卷頁 BEGIN END幫助維基藍船員從用戶巴吉秸稈肖像大他海盜新模板海軍陸戰隊他們不帽子魔鬼 FLUSH TOP BOXAD Navibox 猴他們鱷魚 Down 頁面開始小腿有 Shichibukai 所有有佳能規則維基所有頁水果佐羅貝利海名時圖片一個烏索普戰政府準則 Random' – shoomy

這個版本將工作：

#!/usr/bin/env python 
# -*- coding: UTF-8 -*- 

def evaluate_string_is_ascii(mystring): 
    is_full_ascii=True 
    for c in mystring: 
     try: 
      if ord(c)>0 and ord(c)<=127: 
       #print c,"strict ascii =KEEP" 
       pass 
      elif ord(c)>127 and ord(c)<=255: 
       #print c,"extended ascii code =TRASH" 
       is_full_ascii=False 
       break 
      else: 
       # print c,"no ascii =TRASH" 
       is_full_ascii=False 
       break 
     except: 
      #print c,"no ascii =TRASH" 
      is_full_ascii=False 
      break 
    return is_full_ascii 


my_text_content="""azertwxcv 
123456789 
456dqsdq13 
[email protected]��nS��?t#� 
lkjal� 
kfldjkjl&é""" 

for line in my_text_content.split('\n'): 

    #check if line contain only ascii 
    if evaluate_string_is_ascii(line)==True: 

     #print the line 
     print line

來源

2016-03-10 12:08:23

你的代碼工作的很好，但我想要的是不打印包含非ASCII字符的行，所以如果'已解碼的字符串包含非ASCII字符，它將不會被打印 – shoomy

現在可以嗎？您可以在您自己的代碼中重複使用'evaluate_string_is_ascii（mystring）'函數，如下所示：'如果evaluate_string_is_ascii（解碼）== True：''print decoded +'\ n「' –

現在正在運行，謝謝我的朋友！ – shoomy

您可以刪除非ascii字符，如：編輯：更新與解碼第一。

output = 'string with some non-ascii characters��@$���9�HK��F�23 some more char' 
output = output.decode('utf-8').encode('ascii', 'ignore')

來源

2016-03-10 12:05:50

我得到一個錯誤，這個輸出'追溯（最近呼叫最後）：文件「code.py」，第28行，在解碼= decode.decode（'utf-8'）。encode（'ascii'，'忽略'）文件「/usr/lib/python2.7/encodings/utf_8.py」，第16行解碼返回編解碼器.utf_8_decode（input，errors，True） UnicodeDecodeError：'utf8'編解碼器無法解碼位置0中的字節0x96：無效起始字節' – shoomy

排除python中的非ASCII字符

回答

相關問題