鑑於: 兩個csv文件(每個1.8 MB):AllData_1,AllData_2。每條約8000行。每行由8列組成。 [txt_0,txt_1,txt_2,txt_3,txt_4,txt_5,txt_6,txt_7,txt_8]Python 2列表比較優化
目標: 基於匹配txt_0的(或,AllData_1 [0] == AllData_2),比較的內容接下來的4列爲這些單獨的行。如果數據不相等,則將每一組數據的整行放入基於列不同的列表中,並將列表保存到輸出文件。如果txt_0是一個數據集,但不是另一個,則直接將其保存到輸出文件。
實施例:
AllData_1行x包含:A1,B2,C3,D4,E5,F6,G7,H8] AllData_2列Y包括:[A1,B2,C33C,d44d,E5,F6 ,G7,H8]
程序將保存所有的行x和y的對應ListCol2和ListCol3名單。所有比較完成後,列表將保存到文件中。
如何讓我的代碼更快或將代碼更改爲更快的算法?
i = 0
x0list = []
y0list = []
col1_diff = col2_diff = col3a_diff = col3b_diff = col4_diff = []
#create list out of column 0
for y in AllData_2:
y0list.append(y[0])
for entry in AllData_1:
x0list.append(entry[0])
if entry[0] not in y0list:
#code to save the line to file...
for y0 in AllData_2:
if y0[0] not in x0list:
#code to save the line to file...
for yrow in AllData_2:
i+=1
for xrow in AllData_1:
foundit = 0
if yrow[0] == xrow[0] and foundit == 0 and (yrow[1] != xrow[1] or yrow[2] != xrow[2] or yrow[3] != xrow[3] or yrow[4] != xrow[4]):
if yrow[1] != xrow[1]:
col1_diff.append(yrow)
col1_diff.append(xrow)
foundit = 1
elif yrow[2] != xrow[2]:
col2_diff.append(yrow)
col2_diff.append(xrow)
foundit = 1
elif len(yrow[3]) < len(xrow[3]):
col3a_diff.append(yrow)
col3a_diff.append(xrow)
foundit = 1
elif len(yrow[3]) >= len(xrow[3]):
col3b_diff.append(yrow)
col3b_diff.append(xrow)
foundit = 1
else:
#col4 is actually a catch-all for any other differences between lines if [0]s are equal
col4_diff.append(yrow)
col4_diff.append(xrow)
foundit = 1
也許這可以幫助http://stackoverflow.com/questions/744256/reading-huge-file-in-python –
使用裝置不列出了會員測試。 – katrielalex
'foundit == 0'應該做什麼?它總是如此,因爲在測試之前'foundit'被設置爲0。 –