2015-11-12 101 views
0

我意識到這個問題已被問及一百萬次,並有大量的文件。但是,我無法以正確的格式輸出結果。寫輸出到CSV文件[以正確的格式]

下面的代碼獲得通過從:Replacing empty csv column values with a zero

# Save below script as RepEmptyCells.py 
# Add #!/usr/bin/python to script 
# Make executable by chmod +x prior to running the script on desired .csv file 

# Below code will look through your .csv file and replace empty spaces with 0s 
# This can be particularly useful for genetic distance matrices 

import csv 
import sys 

reader = csv.reader(open(sys.argv[1], "rb")) 
for row in reader: 
    for i, x in enumerate(row): 
       if len(x)< 1: 
         x = row[i] = 0 
    print(','.join(int(x) for x in row)) 

目前,以獲得正確輸出的.csv文件[即在正確的格式]可以在bash運行以下命令:

#After making the script executable   
./RepEmptyCells.py input.csv > output.csv # this produces the correct output 

我試着使用csv.writer函數來產生正確格式化output.csv文件(類似於./RepEmptyCells.py input.csv > output.csv)沒有多少運氣。

我想了解如何將這最後一部分添加到代碼來自動執行該過程,而無需在bash中執行此操作。

我曾嘗試:

f = open(output2.csv, 'w') 

import csv 
import sys 

reader = csv.reader(open(sys.argv[1], "rb")) 
for row in reader: 
    for i, x in enumerate(row): 
       if len(x)< 1: 
         x = row[i] = 0 
    f.write(','.join(int(x) for x in row)) 

f.close() 

當從這個代碼和前一個原始文件看,它們看起來是一樣的。

但是,當我用excel或iNumbers打開它們時,後者(即output2.csv)只顯示一行數據。

重要的是,output.csvoutput2.csv都可以在excel中打開。

回答

3

2個選擇:

  1. 只是做一個f.write('\n')您當前f.write後聲明。

  2. 使用csv.writer。你提到它,但它不在你的代碼中。

    writer = csv.writer(f) 
    ... 
    writer.writerow([int(x) for x in row]) # Note difference in parameter format 
    
+0

感謝。那樣做了!所以你只需要添加新行('/ n')! 1)的作品。 2)仍然沒有,但沒關係。 – Novice

+0

請注意,我很驚訝1)的工作,因爲在Unix上'\ n'會轉換爲LF,而我非常確定Excel只會在CRLF結束時接受csv文件。實際上,這是CSV格式的一個特性,單個LF表示單元格內的換行符。這就是爲什麼你打開Python 2的'rb'和Python 3的'newline =''的原因,因爲csv編寫器處理這個特定的方面,並且會被Python的默認換行抽象打擾。 – Cilyan

1

一種謙虛命題

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import csv 
import sys 

# Use with statement to properly close files 
# Use newline='' which is the right option for Python 3.x 
with open(sys.argv[1], 'r', newline='') as fin, open(sys.argv[2], 'w', newline='') as fout: 
    reader = csv.reader(fin) 
    # You may need to redefine the dialect for some version of Excel that 
    # split cells on semicolons (for _Comma_ Separated Values, yes...) 
    writer = csv.writer(fout, dialect="excel") 
    for row in reader: 
     # Write as reading, let the OS do the caching alone 
     # Process the data as it comes in a generator, checking all cells 
     # in a row. If cell is empty, the or will return "0" 
     # Keep strings all the time: if it's not an int it would fail 
     # Converting to int will force the writer to convert it back to str 
     # anwway, and Excel doesn't make any difference when loading. 
     writer.writerow(cell or "0" for cell in row) 

樣品in.csv

1,2,3,,4,5,6, 
7,,8,,9,,10 

輸出out.csv

1,2,3,0,4,5,6,0 
7,0,8,0,9,0,10 
0
import csv 
import sys 

with open(sys.argv[1], 'rb') as f: 
    reader = csv.reader(f) 
    for row in reader: 
     print row.replace(' ', '0') 

我不明白你需要使用shell和重定向。 一個CSV作家就是:

with open('output.csv', 'wb') as f: 
    writer = csv.writer(f) 
    writer.writerows(rows)