我有一個csv文件,我想從中保存唯一的記錄。在這個文件中,我有第四個字段,它有一些文本,然後是人類或鼠標名稱。像... RHPN1_HUMAN和EPHA5_MOUSEPython從csv文件中提取唯一記錄
因此,例如:EPHA5發生在人類和鼠標,所以我想刪除這個記錄,因爲RHPN1只發生在人類,所以我想保留這個記錄。
file1.csv
meNOG00001 9606 ENSP00000289013 RHPN1_HUMAN
meNOG00005 10090 ENSMUSP00000060646 EPHA5_MOUSE
meNOG00005 9606 ENSP00000273854 EPHA5_HUMAN
meNOG00006 10090 ENSMUSP00000082503 RGPA1_MOUSE
meNOG00006 9606 ENSP00000202677 RGPA2_HUMAN
meNOG00006 9606 ENSP00000302647 RGPA1_HUMAN
meNOG00010 9606 ENSP00000253669 HAUS8_HUMAN
meNOG00011 10090 ENSMUSP00000017629 TOP2B_MOUSE
meNOG00011 10090 ENSMUSP00000068896 TOP2A_MOUSE
meNOG00011 9606 ENSP00000396704 TOP2B_HUMAN
meNOG00011 9606 ENSP00000411532 TOP2A_HUMAN
output.csv
meNOG00001 9606 ENSP00000289013 RHPN1_HUMAN
meNOG00006 9606 ENSP00000202677 RGPA2_HUMAN
meNOG00010 9606 ENSP00000253669 HAUS8_HUMAN
我試過,但我的代碼不能正常工作,因爲我想...
file1 = open("file1.csv", "rU")
reader1 = csv.reader(file1,delimiter=',')
d =[]
c =[]
for row in reader1:
d.append(row[3].split('_')[0])
d=list(set(d))
for row1 in d:
for row2 in reader1:
if row1 == row2[3].split('_')[0]:
c.append(row2)
file1.seek(0)
with open('output.csv', 'w') as f_out:
writer = csv.writer(f_out, delimiter=',')
for k in c:
writer.writerow(k)
這給meNOG00001 ENSP0000028901 \t RHPN1_HUMAN
meNOG00005 ENSP00000273854 \t EPHA5_HUMAN
meNOG00006 ENSP00000302647 \t RGPA1_HUMAN
meNOG00006 ENSP00000202677 \t RGPA2_HUMAN
meNOG00010 ENSP00000253669 \t HAUS8_HUMAN
meNOG00011 ENSP00000396704 \t TOP2B_HUMAN
meNOG00011 ENSP00000411532 \t TOP2A_HUMAN
我需要
meNOG00001 9606 ENSP00000289013 RHPN1_HUMAN
meNOG00006 9606 ENSP00000202677 RGPA2_HUMAN
meNOG00010 9606 ENSP00000253669 HAUS8_HUMAN
– user587739