2016-02-04 97 views
0

我正在使用此代碼拆分大型CSV文件。這工作完美無瑕,但我想知道我將如何調整從輸出文件中刪除一列?我正在根據第2列中的值拆分我的csv,並且只希望返回第1列。拆分大型CSV並刪除列

#!/usr/bin/env python3 
import binascii 
import csv 
import os.path 
import sys 
from tkinter.filedialog import askopenfilename, askdirectory 
from tkinter.simpledialog import askinteger 

def split_csv_file(f, dst_dir, keyfunc): 
    csv_reader = csv.reader(f) 
    csv_writers = {} 
    for row in csv_reader: 
     k = keyfunc(row) 
     if k not in csv_writers: 
      csv_writers[k] = csv.writer(open(os.path.join(dst_dir, k), 
              mode='w', newline='')) 
     csv_writers[k].writerow(row) 

def get_args_from_cli(): 
    input_filename = sys.argv[1] 
    column = int(sys.argv[2]) 
    dst_dir = sys.argv[3] 
    return (input_filename, column, dst_dir) 

def get_args_from_gui(): 
    input_filename = askopenfilename(
     filetypes=(('CSV', '.csv'),), 
     title='Select CSV Input File') 
    column = askinteger('Choose Table Column', 'Table column') 
    dst_dir = askdirectory(title='Select Destination Directory') 
    return (input_filename, column, dst_dir) 

if __name__ == '__main__': 
    if len(sys.argv) == 1: 
     input_filename, column, dst_dir = get_args_from_gui() 
    elif len(sys.argv) == 4: 
     input_filename, column, dst_dir = get_args_from_cli() 
    else: 
     raise Exception("Invalid number of arguments") 
    with open(input_filename, mode='r', newline='') as f: 
     split_csv_file(f, dst_dir, lambda r: r[column-1]+'.csv') 
     # if the column has funky values resulting in invalid filenames 
     # replace the line from above with: 
     # split_csv_file(f, dst_dir, lambda r: binascii.b2a_hex(r[column-1].encode('utf-8')).decode('utf-8')+'.csv') 

這裏是開始時CSV的例子

"<option value="""">Choose Year</option>",ParentID 
"<option value=""Civic1990"">1990</option>",Civic 
"<option value=""CRX1990"">1990</option>",CRX 
"<option value=""Prelude1990"">1990</option>",Prelude 
"<option value=""Accord1990"">1990</option>",Accord 
"<option value=""Prelude1991"">1991</option>",Prelude 
"<option value=""Civic1991"">1991</option>",Civic 
"<option value=""CRX1991"">1991</option>",CRX 
"<option value=""Accord1991"">1991</option>",Accord 
"<option value=""Prelude1992"">1992</option>",Prelude 
"<option value=""Civic1992"">1992</option>",Civic 
"<option value=""Accord1992"">1992</option>",Accord 
"<option value=""Prelude1993"">1993</option>",Prelude 
"<option value=""Civic1993"">1993</option>",Civic 
"<option value=""CivicdelSol1993"">1993</option>",CivicdelSol 
"<option value=""Accord1993"">1993</option>",Accord 
"<option value=""Passport1994"">1994</option>",Passport 

並在結束時,我想結果看起來像:

<option value="">Choose Year</option> 
<option value="Civic1990">1990</option> 
<option value="Civic1991">1991</option> 
<option value="Civic1992">1992</option> 
<option value="Civic1993">1993</option> 
<option value="Civic1994">1994</option> 
<option value="Civic1995">1995</option> 
<option value="Civic1996">1996</option> 
<option value="Civic1997">1997</option> 
<option value="Civic1998">1998</option> 


<option value="">Choose Year</option> 
<option value="Accord1990">1990</option> 
<option value="Accord1991">1991</option> 
<option value="Accord1992">1992</option> 
<option value="Accord1993">1993</option> 
<option value="Accord1994">1994</option> 
<option value="Accord1995">1995</option> 
<option value="Accord1996">1996</option> 
<option value="Accord1997">1997</option> 
<option value="Accord1998">1998</option> 

等等 所以每年和特定型號的選項值在它們自己的csv或txt文件中。

+1

你可以發佈你的csv文件的樣本嗎?和預期的輸出樣本? – haifzhan

回答

0

該行僅僅是Python的字符串列表,所以嘗試:

csv_writers[k].writerow(row[0:1]) 

這將只寫第一列。

第二個問題:

在Python中,你可以使用str.replace(substr, new_substr)

這裏我們正在討論潛在的字符串列表(我知道在這個場景中列表中只有一個字符串),所以列表理解派上用場。

csv_writers[k].writerow([v.replace('""', '"') for v in row[0:1]]) 

這將生成的所有字符串的新列表已取代"""

希望它有幫助!

+0

完美謝謝你! –

+0

另外,有沒有一種方法可以用「?」替換「?」? –

+0

你的意思是帶有問號('「?」')的空字符串('「」')嗎? – totoro