編寫UTF-8時寫入CSV

我有一些簡單的代碼來攝取一些JSON Twitter數據，並輸出一些特定的字段到CSV文件的單獨列。我的問題是，我不能爲我的生活找出將輸出編碼爲UTF-8的正確方法。下面是我能夠得到的最接近的，在這裏的成員的幫助下，但我仍然無法正常運行，並且因爲tweet文本字段中的唯一字符而失敗。編寫UTF-8時寫入CSV

import json 
import sys 
import csv 
import codecs 

def main(): 

    writer = csv.writer(codecs.getwriter("utf-8")(sys.stdout), delimiter="\t") 
    for line in sys.stdin: 
     line = line.strip() 

     data = [] 

     try: 
      data.append(json.loads(line)) 
     except ValueError as detail: 
      continue 

     for tweet in data: 

      ## deletes any rate limited data 
      if tweet.has_key('limit'): 
       pass 

      else: 
       writer.writerow([ 
       tweet['id_str'], 
       tweet['user']['screen_name'], 
       tweet['text'] 
       ]) 

if __name__ == '__main__': 
    main()

來源

2014-05-13 green_bean_4_u

從文檔： https://docs.python.org/2/howto/unicode.html

a = "string" 

encodedstring = a.encode('utf-8')

如果不工作：

Python DictWriter writing UTF-8 encoded CSV files

來源

2014-05-13 17:53:24 1478963

Thanks @ user2100799 - 我一直在嘗試'.encode（'utf-8'）''的所有變體，並且我已經閱讀了文檔，但我似乎仍然無法使其正確地與CSV模塊。還有其他建議嗎？ –

試試這裏：http://stackoverflow.com/questions/5838605/python-dictwriter-writing-utf-8-encoded-csv-files – 1478963

我有同樣的問題。我有大量來自twitter firehose的數據，所以每一個可能的複雜情況（並已出現）！

我使用try /除以下情況外解決了這個問題：

如果字典值是一個字符串：if isinstance(value,basestring)我嘗試編碼它立竿見影。如果不是字符串，我把它作爲一個字符串，然後對它進行編碼。

如果失敗了，這是因爲一些小丑在發微博的奇怪符號來搞亂我的劇本。如果是這樣的話，首先我解碼然後重新編碼字符串和解碼value.decode('utf-8').encode('utf-8')，使之成爲一個字符串，並重新編碼非字符串value.decode('utf-8').encode('utf-8')

有此一展身手：

import csv 

def export_to_csv(list_of_tweet_dicts,export_name="flat_twitter_output.csv"): 

    utf8_flat_tweets=[] 
    keys = [] 

    for tweet in list_of_tweet_dicts: 
     tmp_tweet = tweet 
     for key,value in tweet.iteritems(): 
      if key not in keys: keys.append(key) 

      # convert fields to utf-8 if text 
      try: 
       if isinstance(value,basestring): 
        tmp_tweet[key] = value.encode('utf-8') 
       else: 
        tmp_tweet[key] = str(value).encode('utf-8') 
      except: 
       if isinstance(value,basestring): 
        tmp_tweet[key] = value.decode('utf-8').encode('utf-8') 
       else: 
        tmp_tweet[key] = str(value.decode('utf-8')).encode('utf-8') 

     utf8_flat_tweets.append(tmp_tweet) 
     del tmp_tweet 

    list_of_tweet_dicts = utf8_flat_tweets 
    del utf8_flat_tweets 

    with open(export_name, 'w') as f: 
     dict_writer = csv.DictWriter(f, fieldnames=keys,quoting=csv.QUOTE_ALL) 
     dict_writer.writeheader() 
     dict_writer.writerows(list_of_tweet_dicts) 

    print "exported tweets to '"+export_name+"'" 

    return list_of_tweet_dicts

希望這可以幫助你。

來源

2014-12-12 13:04:49

編寫UTF-8時寫入CSV

回答

相關問題