如何解決Python的硬編碼字典編碼問題

錯誤：如何解決Python的硬編碼字典編碼問題

pymysql.err.InternalError: (1366, "Incorrect string value: '\\xEF\\xBF\\xBD 20...' for column 'history' at row 1")

我收到的這幾個變化，因爲我已經盡力調整我的字典裏，總是在歷史列，唯一的變化是它告訴我的字符是問題。

我不能發佈的字典，因爲它有敏感信息，但這裏是JIST：

我開始與200個地址（包括國家，郵政編碼等），在需要進行驗證，標準化併爲DB插入進行了標準化。
我在Google地圖驗證和標準化上花了很多時間。
我決定變得很花哨，並把所有瘋狂的重音字母放在這些世界地址的地址中（經常是谷歌的拷貝，因爲我不知道如何輸入，而且A與o相比，lol），新加坡巴西，無處不在。
我在處理後在我的字典中結束了120個獨特的地址。
將數據插入SQLite並將其輸出爲CSV時，所有工作都完美無缺。這個問題僅限於MySQL和一些鬼鬼祟祟的不可查看的角色。

注：我以前this7小時複製/粘貼到記事本後刪除口音，在某種程度上，使得它所有正確的編碼，編碼記事本+ +它，只是想處理數據。我認爲我確實輸掉了重音符的版本，現在只有這個工具輸出。

我在我的字典中看不到「\ xEF \ xBF \ xBD 20 ...」，我只看到文字。目前我甚至沒有看到「20」......這兩個字符幫助我找到以前的問題。

代碼我可以證明：

def insert_tables(cursor, assets_final, ips_final): 
    #Insert Asset data into asset table 
    field_names_dict = get_asset_field_names(assets_final) 
    sql_field_names = ",".join(field_names_dict.keys()) 
    for key, row in assets_final.items(): 
     insert_sql = 'INSERT INTO asset(' + sql_field_names + ') VALUES ("' + '","'.join(field_value.replace('"', "'") for field_value in list(row.values())) + '")' 
     print(insert_sql) 
     cursor.execute(insert_sql) 

    #Insert IP data into IP table 
    field_names_dict = get_ip_field_names(ips_final) 
    sql_field_names = ",".join(field_names_dict.keys()) 
    for hostname_key, ip_dict in ips_final.items(): 
     for ip_key, ip_row in ip_dict.items(): 
      insert_sql = 'INSERT INTO ip(' + sql_field_names + ') VALUES ("' + '","'.join(field_value.replace('"', "'") for field_value in list(ip_row.values())) + '")' 
      print(insert_sql) 
      cursor.execute(insert_sql) 

def output_sqlite_db(sqlite_file, assets_final, ips_final): 
    conn = sqlite3.connect(sqlite_file) 
    cursor = conn.cursor() 
    insert_tables(cursor, assets_final, ips_final) 
    conn.commit() 
    conn.close() 

def output_mysql_db(assets_final, ips_final): 
    conn = mysql.connect(host=config.mysql_ip, port=config.mysql_port, user=config.mysql_user, password=config.mysql_password, charset="utf8mb4", use_unicode=True) 
    cursor = conn.cursor() 
    cursor.execute('USE ' + config.mysql_DB) 
    insert_tables(cursor, assets_final, ips_final) 
    conn.commit() 
    conn.close()

編輯：這會不會是與我使用Cygwin作爲我的終端的事實？哈！我加了這條線，並得到了不同的消息（現再次使用重音版本）：

cursor.execute('SET NAMES utf8')

錯誤：

pymysql.err.InternalError: (1366, "Incorrect string value: '\\xC5\\x81A II...' for column 'history' at row 1")

來源

2017-02-15 gunslingor

看看我（http://mysql.rjweb.org/doc.php/charcoll#python） –

詞典標籤去掉;在db表的varchar列中編碼probjem與Python dict（字典）類型被SO合併。 –

我可以照了一下光對您所提供的信息：

情況1：

>>> import unicodedata as ucd 
>>> s1 = b"\xEF\xBF\xBD" 
>>> s1 
b'\xef\xbf\xbd' 
>>> u1 = s1.decode('utf8') 
>>> u1 
'\ufffd' 
>>> ucd.name(u1) 
'REPLACEMENT CHARACTER' 
>>>

看起來你已獲得在比其他的編碼編碼的一些字節utf8（例如cp1252），然後嘗試bytes.decode(encoding='utf8', errors='strict')。這檢測到一些錯誤。然後，您再次解碼錯誤=「替換」。這並沒有引起例外。但是，您的數據已將錯誤字節替換爲替換字符（U + FFFD）。然後，您使用str.encode編碼數據，以便您可以寫入文件或數據庫。每個替換字符變成3個十六進制字節EF BF BD。

...更多的驚喜

案例2：約Python_ _scribblings]

>>> s2 = b"\xC5\x81A II" 
>>> s2 
b'\xc5\x81A II' 
>>> u2 = s2.decode('utf8') 
>>> u2 
'\u0141A II' 
>>> ucd.name(u2[0]) 
'LATIN CAPITAL LETTER L WITH STROKE' 
>>>

來源

2017-02-18 05:07:31

如何解決Python的硬編碼字典編碼問題

回答

相關問題