2015-07-20 29 views
-1

我有一個CSV,看起來是這樣的:python - 列表索引超出範圍,使用CSV?

F02303521,"Smith,Andy",GHI,"Smith,Andy",GHI,,, 
F04300621,"Parker,Helen",CERT,"Yu,Betty",IOUS,,, 

我想刪除所有在第二列等於第4列線(例如當Smith,Andy = Smith,Andy)。我嘗試用"作爲分隔符和分裂列到這樣做在python:

F02303521,Smith,Andy,GHI,Smith,Andy,GHI,,,

我想這個Python代碼:

testCSV = 'test.csv' 
deletionText = 'linestodelete.txt' 
correct = 'correctone.csv' 
i = 0 
j = 0 #where i & j keep track of line number 

with open(deletionText,'w') as outfile: 
    with open(testCSV, 'r') as csv: 
     for line in csv: 
      i = i + 1 #on the first line, i will equal 1. 
      PI = line.split('"')[1] 
      investigator = line.split('"')[3] 

     #if they equal each other, write that line number into the text file 
     as to be deleted. 
     if PI == investigator: 
      outfile.write(i) 



#From the TXT, create a list of line numbers you do not want to include in output 
with open(deletionText, 'r') as txt: 
    lines_to_be_removed_list = [] 

    # for each line number in the TXT 
    # remove the return character at the end of line 
    # and add the line number to list domains-to-be-removed list 
    for lineNum in txt: 
     lineNum = lineNum.rstrip() 
     lines_to_be_removed_list.append(lineNum) 


with open(correct, 'w') as outfile: 
    with open(deletionText, 'r') as csv: 

     # for each line in csv 
     # extract the line number 
     for line in csv: 
      j = j + 1 # so for the first line, the line number will be 1 


      # if csv line number is not in lines-to-be-removed list, 
      # then write that to outfile 
      if (j not in lines_to_be_removed_list): 
       outfile.write(line) 

但這一行:

PI = line.split('"')[1] 

我得到:

Traceback (most recent call last): File "C:/Users/sskadamb/PycharmProjects/vastDeleteLine/manipulation.py", line 11, in PI = line.split('"')[1] IndexError: list index out of range

,我認爲這會做PI = Smith,Andyinvestigator = Smith,Andy ......爲什麼這不會發生呢?

任何幫助將不勝感激,謝謝!

+1

這意味着有在'list'少於兩個元素。把它放在一個'try'塊中,並且有匹配的'except'輸出'line.split(''')' – TigerhawkT3

+1

你有沒有隨機的空行?另外,爲什麼不使用內置的函數?在csv模塊中? – NightShadeQueen

+1

爲什麼不使用非常好的'csv'模塊? –

回答

1

當你想到CSV,認爲pandas,這是Python的大數據分析庫。下面是如何完成你想要的:

import pandas as pd 

fields = ['field{}'.format(i) for i in range(8)] 
df = pd.read_csv("data.csv", header=None, names=fields) 
df = df[df['field1'] != df['field3']] 
print df 

此打印:

 field0  field1 field2 field3 field4 field5 field6 field7 
1 F04300621 Parker,Helen CERT Yu,Betty IOUS  NaN  NaN  NaN 
+0

當我認爲CSV時,我只是想['csv'](https://docs.python.org/3/library/csv.html)。 – TigerhawkT3

+0

@ TigerhawkT3可能是因爲你還沒試過熊貓:) – bananafish

+0

哦,謝謝!我以前從未使用熊貓。我會看更多這個:) – ocean800

-2

嘗試在逗號分割,而不是qoute。

x.split( 「」)

+0

如果我在逗號分割,它會將名稱分成兩半,這不是我想要的,雖然 – ocean800

+1

如上所述在上面的評論中,CSV不是簡單的逗號分隔格式 - 各種impl ementations包含額外的規則來處理諸如包含逗號的值的事情,就像OP的文件一樣。 – TigerhawkT3