2012-02-18 182 views
16

我有一個zip文件(大約10,000個小文件)的目錄,每個文件都是一個CSV文件,我試圖讀取並分割成多個不同的CSV文件。從Zip文件中讀取CSV

我設法編寫代碼以從CSV目錄中拆分CSV文件,如下所示,它讀取CSV的第一個atttribute,並且取決於將它寫入相關的CSV。

import csv 
import os 
import sys 
import re 
import glob 

reader = csv.reader(open("C:/Projects/test.csv", "rb"), delimiter=',', quotechar='"') 
write10 = csv.writer(open('ouput10.csv', 'w'), delimiter=',', lineterminator='\n', quotechar='"', quoting=csv.QUOTE_NONNUMERIC) 
write15 = csv.writer(open('ouput15.csv', 'w'), delimiter=',', lineterminator='\n', quotechar='"', quoting=csv.QUOTE_NONNUMERIC) 


headings10=["RECORD_IDENTIFIER","CUSTODIAN_NAME","LOCAL_CUSTODIAN_NAME","PROCESS_DATE","VOLUME_NUMBER","ENTRY_DATE","TIME_STAMP","VERSION","FILE_TYPE"] 
write10.writerow(headings10) 

headings15=["RECORD_IDENTIFIER","CHANGE_TYPE","PRO_ORDER","USRN","STREET_DESCRIPTION","LOCALITY_NAME","TOWN_NAME","ADMINSTRATIVE_AREA","LANGUAGE"] 
write15.writerow(headings15) 


for row in reader: 
    type = row[0] 
    if "10" in type:   
     write10.writerow(row) 
    elif "15" in type: 
     write15.writerow(row) 

因此,我現在試圖讀取Zip文件,而不是浪費時間先提取它們。

這就是我,因爲我已經找到

import glob 
import os 
import csv 
import zipfile 
import StringIO 

for name in glob.glob('C:/Projects/abase/*.zip'): 
    base = os.path.basename(name) 
    filename = os.path.splitext(base)[0] 


datadirectory = 'C:/Projects/abase/' 
dataFile = filename 
archive = '.'.join([dataFile, 'zip']) 
fullpath = ''.join([datadirectory, archive]) 
csv = '.'.join([dataFile, 'csv']) 


filehandle = open(fullpath, 'rb') 
zfile = zipfile.ZipFile(filehandle) 
data = StringIO.StringIO(zfile.read(csv)) 
reader = csv.reader(data) 

for row in reader: 
    print row 

但是和錯誤被拋出

下儘可能多的教程後,至今

AttributeError的:「海峽」對象有沒有屬性「讀者」

希望有人可以告訴我如何更改我的CSV閱讀代碼,用於閱讀Zip文件。

非常感謝

+0

也許這是你如何粘貼代碼,但幾乎沒有什麼是你的名字循環。這個錯誤指的是什麼? – 2012-02-18 18:27:47

回答

19

簡單的解決。您使用當地的csv變量覆蓋了csv模塊。只需更改該變量的名稱:

import glob 
import os 
import csv 
import zipfile 
import StringIO 

for name in glob.glob('C:/Projects/abase/*.zip'): 
    base = os.path.basename(name) 
    filename = os.path.splitext(base)[0] 


    datadirectory = 'C:/Projects/abase/' 
    dataFile = filename 
    archive = '.'.join([dataFile, 'zip']) 
    fullpath = ''.join([datadirectory, archive]) 
    csv_file = '.'.join([dataFile, 'csv']) #all fixed 


    filehandle = open(fullpath, 'rb') 
    zfile = zipfile.ZipFile(filehandle) 
    data = StringIO.StringIO(zfile.read(csv_file)) #don't forget this line! 
    reader = csv.reader(data) 

    for row in reader: 
     print row 
+0

那傑出傑出。 – tjmgis 2012-02-18 18:50:06

+0

但是,它似乎沒有通過壓縮文件循環? – tjmgis 2012-02-18 18:50:30

+0

@ user1218419:檢查您的縮進。正如斯科特·亨特所說,你的大部分代碼都在glob.glob('c:etc'):'循環中的名字之下,因此在你之外。 – DSM 2012-02-18 19:03:17