2013-12-13 91 views
0

我正在使用python代碼從許多csv文件中讀取並將編碼設置爲utf8.I閱讀文件時遇到問題我可以讀取所有行,但是當我寫它,它只能寫1行。請幫我檢查我的代碼如下:閱讀許多csv文件,並將其編碼爲utf8使用python

def convert_files(files, ascii, to="utf-8"): 
for name in files: 
#print ("Convert {0} from {1} to {2}").format(name, ascii, to) 
    with open(name) as f: 
     print(name) 
     count = 0 
     lineno = 0 
     #this point I want to write the below text into my each new file at the first line   
     #file_source.write('id;nom;prenom;nom_pere;nom_mere;prenom_pere;prenom_mere;civilite (1=homme 2=f);date_naissance;arrondissement;adresse;ville;code_postal;pays;telephone;email;civilite_demandeur (1=homme 2=f);nom_demandeur;prenom_demandeur;qualite_demandeur;type_acte;nombre_actes\n') 
     for line in f.readlines(): 
      lineno +=1 
      if lineno == 1 : 
       continue 
      file_source = open(name, mode='w', encoding='utf-8', errors='ignore') 
      #pass 
      #print (line) 
      # start write data to to new file with encode 

      file_source.write(line) 
      #file_source.close 

#print unicode(line, "cp866").encode("utf-8") 
csv_files = find_csv_filenames('./csv', ".csv") 
convert_files(csv_files, "cp866") 

回答

1

您在每次迭代期間重新打開文件。

for line in f.readlines(): 
     lineno +=1 
     if lineno == 1 : 
      continue 
     #move the following line outside of the for block 
     file_source = open(name, mode='w', encoding='utf-8', errors='ignore') 
0

如果你需要的是改變文件的字符編碼那麼它並不重要,他們是CSV文件,除非轉換可能改變被解釋爲分隔符,quotechar等什麼字符:

def convert(filename, from_encoding, to_encoding): 
    with open(filename, newline='', encoding=from_encoding) as file: 
     data = file.read().encode(to_encoding) 
    with open(filename, 'wb') as outfile: 
     outfile.write(data) 

for path in csv_files: 
    convert(path, "cp866", "utf-8") 

添加errors參數可更改編碼/解碼錯誤的處理方式。

如果文件可能會很大,那麼你可以逐步的數據轉換:

import os 
from shutil import copyfileobj 
from tempfile import NamedTemporaryFile 

def convert(filename, from_encoding, to_encoding): 
    with open(filename, newline='', encoding=from_encoding) as file: 
     with NamedTemporaryFile('w', encoding=to_encoding, newline='', 
           dir=os.path.dirname(filename)) as tmpfile: 
      copyfileobj(file, tmpfile) 
      tmpfile.delete = False 
    os.replace(tmpfile.name, filename) # rename tmpfile -> filename 

for path in csv_files: 
    convert(path, "cp866", "utf-8") 
0

你可以做到這一點

def convert_files(files, ascii, to="utf-8"): 
    for name in files: 
     with open(name, 'r+') as f: 
      data = ''.join(f.readlines()) 
      data.decode(ascii).encode(to) 
      f.seek(0) 
      f.write(data) 
      f.truncate() 
+0

' ''。加入(f.readlines())'應該寫成'f.read()' – jfs

+0

看來OP使用Python 3,否則'encoding'參數不可用於內置open函數。因此'data.decode()'將失敗,因爲'data'已經是Unicode。 – jfs

+0

thx爲您的指南〜 – wcp