0
我能夠在mylocal機器的elasticsearch索引中導入文本文件。在python3中跳過混合編碼文本中的非ascii字符的最佳做法是什麼?
儘管使用虛擬環境,生產機器簡直是一場噩夢,因爲我一直有這樣的錯誤:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 79: ordinal not in range(128)
我使用python3,我個人是有在python2少的問題,也許它只是無奈浪費了幾個小時。
我不明白爲什麼,我不能剝奪或處理非ASCII字符:
我試圖導入:
from unidecode import unidecode
def remove_non_ascii(text):
return unidecode(unicode(text, encoding = "utf-8"))
使用python2,沒有成功。
回python3:
import string
printable = set(string.printable)
''.join(filter(lambda x: x in printable, 'mixed non ascii string')
沒有成功
import codecs
with codecs.open(path, encoding='utf8') as f:
....
沒有成功
嘗試:
# -*- coding: utf-8 -*-
沒有成功
https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize
沒有成功...
上述所有似乎在具有下列錯誤無法剝離或處理非ASCII,這是非常麻煩的,我把:
with open(path) as f:
for line in f:
line = line.replace('\n','')
el = line.split('\t')
print (el)
_id = el[0]
_source = el[1]
_name = el[2]
# _description = ''.join(filter(lambda x: x in printable, el[-1]))
#
_description = remove_non_ascii(el[-1])
print (_id, _source, _name, _description, setTipe(_source))
action = {
"_index": _indexName,
"_type": setTipe(_source),
"_id": _source,
"_source": {
"name": _name,
"description" : _description
}
}
helpers.bulk(es, [action])
File "<stdin>", line 22, in <module>
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 194, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 87, in _process_bulk_chunk
resp = client.bulk('\n'.join(bulk_actions) + '\n', **kwargs)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 79: ordinal not in range(128)
我想有一個「確定」的做法來處理python3編碼問題 - 我在不同的機器上使用相同的腳本,並有不同的結果...
提供實際重現您嘗試解決的問題的實例可以更輕鬆地解決問題。請參閱[如何提問](https://stackoverflow.com/help/how-to-ask)和[製作最小,完整,可驗證示例](https://stackoverflow.com/help/mcve)。 –