閱讀許多csv文件，並將其編碼爲utf8使用python

我正在使用python代碼從許多csv文件中讀取並將編碼設置爲utf8.I閱讀文件時遇到問題我可以讀取所有行，但是當我寫它，它只能寫1行。請幫我檢查我的代碼如下：閱讀許多csv文件，並將其編碼爲utf8使用python

def convert_files(files, ascii, to="utf-8"): 
for name in files: 
#print ("Convert {0} from {1} to {2}").format(name, ascii, to) 
    with open(name) as f: 
     print(name) 
     count = 0 
     lineno = 0 
     #this point I want to write the below text into my each new file at the first line   
     #file_source.write('id;nom;prenom;nom_pere;nom_mere;prenom_pere;prenom_mere;civilite (1=homme 2=f);date_naissance;arrondissement;adresse;ville;code_postal;pays;telephone;email;civilite_demandeur (1=homme 2=f);nom_demandeur;prenom_demandeur;qualite_demandeur;type_acte;nombre_actes\n') 
     for line in f.readlines(): 
      lineno +=1 
      if lineno == 1 : 
       continue 
      file_source = open(name, mode='w', encoding='utf-8', errors='ignore') 
      #pass 
      #print (line) 
      # start write data to to new file with encode 

      file_source.write(line) 
      #file_source.close 

#print unicode(line, "cp866").encode("utf-8") 
csv_files = find_csv_filenames('./csv', ".csv") 
convert_files(csv_files, "cp866")

來源

2013-12-13 user3024562

您在每次迭代期間重新打開文件。

for line in f.readlines(): 
     lineno +=1 
     if lineno == 1 : 
      continue 
     #move the following line outside of the for block 
     file_source = open(name, mode='w', encoding='utf-8', errors='ignore')

來源

2013-12-13 04:27:58 autodidacticon

如果你需要的是改變文件的字符編碼那麼它並不重要，他們是CSV文件，除非轉換可能改變被解釋爲分隔符，quotechar等什麼字符：

def convert(filename, from_encoding, to_encoding): 
    with open(filename, newline='', encoding=from_encoding) as file: 
     data = file.read().encode(to_encoding) 
    with open(filename, 'wb') as outfile: 
     outfile.write(data) 

for path in csv_files: 
    convert(path, "cp866", "utf-8")

添加errors參數可更改編碼/解碼錯誤的處理方式。

如果文件可能會很大，那麼你可以逐步的數據轉換：

import os 
from shutil import copyfileobj 
from tempfile import NamedTemporaryFile 

def convert(filename, from_encoding, to_encoding): 
    with open(filename, newline='', encoding=from_encoding) as file: 
     with NamedTemporaryFile('w', encoding=to_encoding, newline='', 
           dir=os.path.dirname(filename)) as tmpfile: 
      copyfileobj(file, tmpfile) 
      tmpfile.delete = False 
    os.replace(tmpfile.name, filename) # rename tmpfile -> filename 

for path in csv_files: 
    convert(path, "cp866", "utf-8")

來源

2013-12-13 04:40:25 jfs

你可以做到這一點

def convert_files(files, ascii, to="utf-8"): 
    for name in files: 
     with open(name, 'r+') as f: 
      data = ''.join(f.readlines()) 
      data.decode(ascii).encode(to) 
      f.seek(0) 
      f.write(data) 
      f.truncate()

來源

2013-12-13 04:43:09 wcp

' ''。加入（f.readlines（））'應該寫成'f.read（）' – jfs

看來OP使用Python 3，否則'encoding'參數不可用於內置open函數。因此'data.decode（）'將失敗，因爲'data'已經是Unicode。 – jfs

thx爲您的指南〜 – wcp

閱讀許多csv文件，並將其編碼爲utf8使用python

回答

相關問題