2013-04-17 82 views
0

所以我試圖打開並閱讀沒有字段名稱的csv文件。根據我所做的研究,我很確定它是用UTF-8編碼的。我的CSV有這樣的格式:沒有字段名稱CSV閱讀器Python UTF-8

1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 

我使用以下方法來打開並閱讀:

def parseCSVCounter(csv_file): 

with codecs.open(csv_file, "r", "utf-8-sig","strict", -1) as f: 
    f = str(f) 
    relayreader = csv.reader(f, delimiter=',') 
    for row in relayreader: 
     print(row) 

     try: 
      #row[0] = unicode(row[0], 'latin-1') 
      counter(row) 
      print('starting row..') 

     except UnicodeDecodeError, e: 
      print('something went wrong1') 
      print e 

     except Exception, e: 
      print('something went wrong') 
      print e 

這產生

Starting Command.. 
['<'] 
something went wrong 
invalid literal for int() with base 10: '<' 
['o'] 
something went wrong 
invalid literal for int() with base 10: 'o' 
........ 
starting row.. 
['9'] 
starting row.. 
['3'] 
starting row.. 
['8'] 
starting row.. 
['2'] 
starting row.. 
['8'] 
starting row.. 
['>'] 
something went wrong 
invalid literal for int() with base 10: '>'` 

我剪下來,以證明我的觀點。它似乎爲我自動生成字段名稱。用csv.DictReader(fieldnames = 'foo')我可以在一個序列中指定字段名稱。我如何得到csv.reader()忽略字段名稱的缺乏?

+0

'>>> X =打開( 'AAA', 'W')'' >>> STR(x)的'' 「<打開文件 'AAA',模式 'W' 在0x01FC9860>」' – jamylak

回答

3

你做不是需要打電話str(f);使用文件對象直接

with codecs.open(csv_file, "r", "utf-8-sig", "strict") as f: 
    relayreader = csv.reader(f, delimiter=',') 

你想讀的str(f)輸出爲CSV文件,而不是,這是形式的字符串:

<open file '/path/to/file', mode 'rb' at 0x105f10d20> 

你可以看到,從你的錯誤輸出;它是拼寫出<,o等一路到內存地址的數字,並關閉>

注意,utf-8-sig編解碼器可以處理UTF-8編碼BOM出席文件的開始,但除非該BOM預期存在,正常UTF-8編解碼器會做得很好。