我有一個場景,其中發送用於分析的日誌文件有一些非ASCII字符,並最終打破了我無法控制的分析工具之一。所以我決定自己清理一下這個日誌,並且提出了以下這個工作,除了當我看到這些字符時我會跳過整條線。我 嘗試逐行字符(檢查註釋)的代碼,以便只有這些字符可以被刪除並保存實際的ASCII字符,但不能成功。 該評論邏輯和建議/解決方案能否解決該問題的任何原因?使用python從文件中刪除非ASCII字符
1:02:失敗
採樣線54.934/174573 ENQÎNULSUB AY NULEOT/29/abcdefghijg
功能來讀取和刪除線:
def readlogfile(self, abs_file_name):
"""
Reads and skip the non-ascii chars line from the attached log file and populate the list self.data_bytes
abs_file_name file name should be absolute path
"""
try:
infile = open(abs_file_name, 'rb')
for line in infile:
try:
line.decode('ascii')
self._data_bytes.append(line)
except UnicodeDecodeError as e :
# print line + "Invalid line skipped in " + abs_file_name
print line
continue
# while 1: #code that didn't work to remove just the non-ascii chars
# char = infile.read(1) # read characters from file
# if not char or ord(char) > 127 or ord(char) < 0:
# continue
# else:
# sys.stdout.write(char)
# #sys.stdout.write('{}'.format(ord(char)))
# #print "%s ord = %d" % (char, ord(char))
# self._data_bytes.append(char)
finally:
infile.close()
http://stackoverflow.com/questions/33511317/removing-non-ascii-characters-from-file-text/33511747#33511747這傢伙原代碼應該爲你工作。 –