2017-02-28 174 views
0

我的代碼能夠獲取文本文件的28列並格式化/刪除一些數據。我如何選擇特定的列?我想要的列是0到25和列28.什麼是最好的方法?從CSV文件中選擇特定列

在此先感謝!

import csv 
import os 

my_file_name = os.path.abspath('NVG.txt') 
cleaned_file = "cleanNVG.csv" 
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC'] 


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    cr = csv.reader(infile, delimiter='|') 
    writer.writerow(next(cr)[:28]) 
    for line in (r[0:28] for r in cr): 

     if not any(remove_word in element for element in line for remove_word in remove_words): 
     line[11]= line[11][:5] 

     writer.writerow(line) 
infile.close() 
outfile.close() 

回答

3

看看pandas

import pandas as pd 

usecols = list(range(26)) + [28] 
data = pd.read_csv(my_file_name, usecols=usecols) 

您還可以方便的使用數據寫入filter()返回到一個新的文件

with open(cleaned_file, 'w') as f: 
    data.to_csv(f) 
+0

'Pandas'使得數據操作如此簡單並可行。從我+1。 –

1

排除列26和column27從行:

for row in cr: 
    content = list(filter(lambda x: row.index(x) not in [25,26], row)) 
    # work with the selected columns content 
+0

如果你不得不調用列表,爲什麼不在這裏使用列表理解:'content = [x for x in cr if cr.index(x)not in [25,26]]' – Ohjeah

+0

您可能是想過濾排,而不是讀者。現在,您會在for循環的第一次迭代中耗盡讀者。使用find也是浪費的,爲什麼不'enumerate()'? –

+0

@IljaEverilä是的,'排',修正了錯字。謝謝! – haifzhan