2015-09-14 82 views
1

我一直在研究這個代碼,讀取文件,尤其是csv文件,刪除分隔符後的所有數據';' 。 該文件包含數字,並從這些數字我需要提取重複。但是,當我打印提取的行時,它不會被視爲一個數字。 我有點卡住了,我會很感激任何建議。python - 無法刪除csv文件上的重複項

謝謝你,這裏是代碼:

#!/usr/bin/python2 

# import module csv, for csv files 
import csv 

# Create a list 
hid = list() 

# command prompt to enter file 
fname = raw_input('Enter a file name: ') 
# This is to hard code a file name to spare typing when testing, change the file name within ''. 
if len(fname) == 0: fname = '20150909_0.csv' 

# Define the variable to open file, and use from module csv, to read the csv file. 
# alternative open --> fo = open(fname) 
# and mycsv = csv.reader(open('20150909.csv')) 
fo = csv.reader(open(fname)) 

# Initiate count 
count = 0 
# Try to extract the duplicate: 
#unique = 0 
# Loop that reads, for each iteration (in this case the iteration is 'row', and 'fo' is the file), 
# each csv row, strips, the extra characters, and then splits them into a list by delimiter character ";" 
# and prints it 
# To parse the data into a file, please type at cli "C:\python 'filename.py' > output.file 
for row in fo: 
    text = row[0] 
    line = text.strip() 
    parts = line.split(";") 
    col1 = parts[0] 
    print col1 

# Loop within the loop, for every word check if the word is in the list, else append it. 
# then sort it. 
    for parts in col1: 
     if parts in hid: continue 
     if parts != hid: 
      hid.append(parts) 


#  unique = unique + float(parts) 


print "The total number of Hotel ID\'s entries is:", count 
#print "The number of unique Hotel ID\'s is:", unique 


for number in (int(hid) for L in lines): 
    count = count + 1 
    print count 
+0

可能你還爲你提供試圖與代碼一起解析/處理樣本數據。這將使事情更容易理解 – Sanju

+0

8555; 0989; 3245; 5646; 1212 8855; 0989; 3245; 5646; 1212 8555; 0989; 3245; 5646; 1212 8355; 0989; 3245; 5646; 1212以上是條目。有4行和4行。我想要的行是第一個(8555,8855,8555,8355)並且想要刪除重複項。我似乎無法找到如何在評論中插入代碼, – spen

+0

好吧,我仍然無法獲取數據格式,您是否想說,您的數據有4行4列,每列由「 ;」你需要第一列作爲列表? – Sanju

回答

0
#!/usr/bin/python2 

# import module csv, for csv files 
import csv 

# Create a list 
hid = list() 

# command prompt to enter file 
fname = raw_input('Enter a file name: ') 
# This is to hard code a file name to spare typing when testing, change the file name within ''. 
if len(fname) == 0: fname = '20150909_0.csv' 

# Define the variable to open file, and use from module csv, to read the csv file. 
# alternative open --> fo = open(fname) 
# and mycsv = csv.reader(open('20150909.csv')) 
fo = csv.reader(open(fname)) 

# Initiate count 
count = 0 
# Try to extract the duplicate: 
#unique = 0 
# Loop that reads, for each iteration (in this case the iteration is 'row', and 'fo' is the file), 
# each csv row, strips, the extra characters, and then splits them into a list by delimiter character ";" 
# and prints it 
# To parse the data into a file, please type at cli "C:\python 'filename.py' > output.file 
for row in fo: 
    text = row[0] 
    line = text.strip() 
    parts = line.split(";") 
    col1 = parts[0] 
    # if you want only the first column do the processing here 
    if col1 not in hid: 
     hid.append(col1) 



print "The total number of Hotel ID\'s entries is:", count 
print "The number of unique Hotel ID\'s is:", hid