編碼再次

2012-10-13 110 views 0 likes

我想對Python的使用SQLite的工作：編碼再次

from pysqlite2 import dbapi2 as sqlite 
con = sqlite.connect('/home/argon/super.db') 
cur = con.cursor() 
cur.execute('select * from notes') 
for i in cur.fetchall(): 
    print i[2]

有時候，我得到這樣的事情（我是從俄羅斯）：

&#208;&#158;&#209;&#130;&#208;&#178;&#208;&#181;&#209;&#130; etc...

如果我通過這個字符串這個功能（它幫助我在其他項目中）：

def unescape(text): 
    def fixup(m): 
     text = m.group(0) 
     if text[:2] == "&#": 
      # character reference 
      try: 
       if text[:3] == "&#x": 
        return unichr(int(text[3:-1], 16)) 
       else: 
        return unichr(int(text[2:-1])) 
      except ValueError: 
       pass 
     else: 
      # named entity 
      try: 
       text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]) 
      except KeyError: 
       pass 
     return text # leave as is 
    return re.sub("&#?\w+;", fixup, text)

我得到更加怪異的結果：

ÐÑÐ²ÐµÑÐ¸ÑÑ Ñ ÑÐ¸ÑÐ¸ÑÐ¾Ð²Ð°Ð½Ð¸ÐµÐ¼ etc

我該怎麼做才能得到正常的西里爾符號？

來源

2012-10-13 scythargon

回答

Ð對於\xD0\x9E或\u1054看起來像UTF-8字節對。更好地稱爲西里爾字符О（Capital O）。

換句話說，你奇怪地編碼了你手上的UTF-8數據。將{數字轉換爲字節（chr(208)會做）然後從UTF-8解碼：

>>> (chr(208) + chr(158)).decode('utf-8') 
u'\u1054' 
>>> print (chr(208) + chr(158)).decode('utf-8') 
О 
>>> print (chr(208) + chr(158) + chr(209) + chr(130) + chr(208) + chr(178)).decode('utf-8') 
Отв

來源

2012-10-13 20:58:10

相關問題

11. 如何讓代碼再次嘗試？
12. visual c＃代碼再次加載Form1
13. tsplot不再密謀（再次）
14. 的Python - 字符串改變再次解碼和編碼後（ZLIB +的base64）
15. Mediacodec再次解碼視頻和編碼得到一個損壞的文件
16. LNK2001，再次
17. 再次調用
18. mysql：secure_file_priv（再次）
19. .NET Thread.Abort再次
20. PHP：strtotime（）...再次
21. 如何再次
22. 再次滾動
23. MSSQL DISTINCT再次
24. java.lang.ClassNotFoundException ....再一次
25. Globals再一次
26. 再次在UITableViewCell
27. 浮點再次
28. stdClass（再次）
29. android.database.CursorIndexOutOfBoundsException ...再次
30. NetworkOnMainThreadException（再次）