我想使用python將unicode轉換爲拉丁字符，我有一個大的文本文件，其中包含unicode和所有的tweet。我只想替換4個unicode，比如\ u00f6，\ u015f，.. 我只是想知道tweet是如何被實際推送的（原始語言）。這裏是實際收集推文並保存到文本文件中的代碼。「＃！/ usr/bin/python如何將unicode轉換爲拉丁字符python

pep-0263.html細部」

class listener(StreamListener): 

    def on_data(self,data): 
     try: 
      dirty = open('turkeyjson28.txt','a') 
      encode = data.encode('ascii','ignore') 
      dirty.write(encode) 
      good = tweet.decode("utf-8") """ 
      better = good.decode("utf=8").replace(u"\u00f6", "ö") 
      print better  
      dirty.write('\n') 
      dirty.close() 
      tweet = data.split(',"text":"')[1].split('","source')[0] 
      #saveThis = str(time.time())+'::'+tweet 
      saveFile = open('turkey_clean28.txt','a') 
      saveFile.write(better) 
      saveFile.write('\n') 
      saveFile.write('\n') 
      saveFile.close() 
      return True 
     except BaseException, e: 
      print 'failed ondata,',str(e) 
      time.sleep(5) 
    def on_error(self, status): 
     print status 

auth = OAuthHandler(ckey,csecret) 
auth.set_access_token(atoken,asecret) 
twitterStream = Stream(auth,listener()) 
twitterStream.filter(track = ["turkey"])

來源

2014-09-29 sruti hasan

'.encode（「latin1」）'可能是你在找的東西......但它很難說......它會是tter如果你簡化你的問題，只是用一個硬編碼的字符串調用'on_data'不工作，你想如何...使用utf8更常見，雖然 – 2014-09-29 18:26:06

謝謝，但沒有幫助 – 2014-09-29 20:23:46

better = good.decode("utf-8").replace(u"\u00f6", "ö")

變化

better = good.decode("utf-8").replace(u"\u00f6", u"\u00f6".encode("utf8"))

或作爲文件的第一行，你需要

#!/usr/bin/python 
# -*- coding: utf8 -*-

一般來說，我會盡量避免使用的編碼解決方案，只是用Unicode字符編碼，你想怎麼

我會經常更換寫一對輔助函數來協助完成這項工作

def decode(byte_str,encodings=["latin1","utf8","cp1252"]): 
    if not isinstance(byte_str,str) and isinstance(byte_str,unicode): 
     byte_str = encode(byte_str,encodings) 
    for enc in encodings: 
     try: 
      return byte_str.decode(enc) 
     except UnicodeDecodeError: 
      continue 

def encode(unicode_txt,encodings=["latin1","utf8","cp1252"]): 
    if not isinstance(unicode_txt,unicode) and isinstance(unicode_txt,str): 
     unicode_txt = decode(unicode_txt,encodings) 
    for enc in encodings: 
     try: 
      return unicode_txt.encode(enc) 
     except UnicodeDecodeError: 
      continue 

#then you can just do something like 
decode(good).replace(u"\u00f6",decode(u"\u00f6",encodings=["utf8","latin1","ascii"]))

來源

2014-09-29 20:49:22

謝謝，我有一個文本文件，其中有推文unicode，請幫助我如何將具有unicode的tweets轉換爲土耳其字符..我是python的新手..我卡住了......這就是我的文本文件的樣子「土耳其從ISIS回來了46個人質。那是怎麼發生的？ http：\/\/t.co \/ELxQsArkWR http：\/\/t.co \/JEZF7uSvjY FethullahGöufclenHocaefendi \ u200bden Ali Bula \ u207a ge \ u00e7mi \ ufd olsun; \ u0130dris Naim \ u015eahin \ u2019in annesi i \ u00e7in taziye mesaj \ u0131 http：\/\/t.co \/w2g2fADgbt 「 – 2014-09-30 02:30:17

如何將unicode轉換爲拉丁字符python

*編碼：ISO 8859-9 _ * _....」我得到這個錯誤「 21，但沒有聲明編碼;見http://www.python.org/peps/pep-0263.html細部」

回答

相關問題

編碼：ISO 8859-9 _ _....」我得到這個錯誤「 21，但沒有聲明編碼;見http://www.python.org/peps/pep-0263.html細部」