0
我正在嘗試使用包含在tar.gz文件中的csv文件,並且遇到問題將正確的數據/對象傳遞給csv模塊。Python3在tar文件中使用csv文件
說我有一個tar.gz文件,其中包含許多格式化的csv文件,如下所示。
1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30
1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26
1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31
1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38
我希望能夠訪問內存中的每個csv文件,而不從tar文件中提取的每個文件,並將其寫入磁盤。 例如:
import tarfile
import csv
tar = tarfile.open("tar-file.tar.gz")
for member in tar.getmembers():
f = tar.extractfile(member).read()
content = csv.reader(f)
for row in content:
print(row)
tar.close()
這產生了以下錯誤。
for row in content:
_csv.Error: iterator should return strings, not int (did you open the file in text mode?)
我也嘗試解析f作爲csv模塊文檔中描述的字符串。
content = csv.reader([f])
以上產生相同的錯誤。
我試着解析文件對象f ascii。
f = tar.extractfile(member).read().decode('ascii')
但這迭代每個csv元素,而不是迭代包含元素列表的行。
['1']
['0']
['7']
['9']
['', '']
['S']
['A']
['M']
['P']
['L']
['E']
['_']
['A']
['', '']
['G']
['R']
剪斷...
['2']
['0']
['1']
['7']
['/']
['0']
['2']
['/']
['1']
['5']
[' ']
['2']
['2']
[':']
['5']
['7']
[':']
['3']
['8']
[]
[]
試圖既解析˚F爲ASCII和讀取它作爲一個字符串
f = tar.extractfile(member).read().decode('ascii')
content = csv.reader([f])
產生以下輸出
for row in content:
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
要展示了我用以下方面的不同結果ng代碼。
import tarfile
import csv
tar = tarfile.open("tar-file.tar.gz")
for member in tar.getmembers():
f = tar.extractfile(member).read()
print(member.name)
print('Raw :', type(f))
print(f)
print()
f = f.decode('ascii')
print('ASCII:', type(f))
print(f)
tar.close()
這產生以下輸出。 (每個csv在本例中都包含相同的數據)。
./raw_data/csv-file1.csv
Raw : <class 'bytes'>
b'1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30\n1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26\n1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31\n1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38\n\n'
ASCII: <class 'str'>
1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30
1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26
1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31
1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38
./raw_data/csv-file2.csv
Raw : <class 'bytes'>
b'1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30\n1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26\n1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31\n1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38\n\n'
ASCII: <class 'str'>
1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30
1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26
1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31
1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38
./raw_data/csv-file3.csv
Raw : <class 'bytes'>
b'1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30\n1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26\n1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31\n1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38\n\n'
ASCII: <class 'str'>
1079,SAMPLE_A,GROUP,001,,2017/02/15 22:57:30
1041,SAMPLE_B,GROUP,023,,2017/02/15 22:57:26
1077,SAMPLE_C,GROUP,005,,2017/02/15 22:57:31
1079,SAMPLE_A,GROUP,128,,2017/02/15 22:57:38
如何讓csv模塊正確讀取tar模塊提供的內存中的文件? 謝謝。
感謝馬丁,這很好地訣竅。 – Pobbel