1
我一直在研究這個代碼,讀取文件,尤其是csv文件,刪除分隔符後的所有數據';' 。 該文件包含數字,並從這些數字我需要提取重複。但是,當我打印提取的行時,它不會被視爲一個數字。 我有點卡住了,我會很感激任何建議。python - 無法刪除csv文件上的重複項
謝謝你,這裏是代碼:
#!/usr/bin/python2
# import module csv, for csv files
import csv
# Create a list
hid = list()
# command prompt to enter file
fname = raw_input('Enter a file name: ')
# This is to hard code a file name to spare typing when testing, change the file name within ''.
if len(fname) == 0: fname = '20150909_0.csv'
# Define the variable to open file, and use from module csv, to read the csv file.
# alternative open --> fo = open(fname)
# and mycsv = csv.reader(open('20150909.csv'))
fo = csv.reader(open(fname))
# Initiate count
count = 0
# Try to extract the duplicate:
#unique = 0
# Loop that reads, for each iteration (in this case the iteration is 'row', and 'fo' is the file),
# each csv row, strips, the extra characters, and then splits them into a list by delimiter character ";"
# and prints it
# To parse the data into a file, please type at cli "C:\python 'filename.py' > output.file
for row in fo:
text = row[0]
line = text.strip()
parts = line.split(";")
col1 = parts[0]
print col1
# Loop within the loop, for every word check if the word is in the list, else append it.
# then sort it.
for parts in col1:
if parts in hid: continue
if parts != hid:
hid.append(parts)
# unique = unique + float(parts)
print "The total number of Hotel ID\'s entries is:", count
#print "The number of unique Hotel ID\'s is:", unique
for number in (int(hid) for L in lines):
count = count + 1
print count
可能你還爲你提供試圖與代碼一起解析/處理樣本數據。這將使事情更容易理解 – Sanju
8555; 0989; 3245; 5646; 1212 8855; 0989; 3245; 5646; 1212 8555; 0989; 3245; 5646; 1212 8355; 0989; 3245; 5646; 1212以上是條目。有4行和4行。我想要的行是第一個(8555,8855,8555,8355)並且想要刪除重複項。我似乎無法找到如何在評論中插入代碼, – spen
好吧,我仍然無法獲取數據格式,您是否想說,您的數據有4行4列,每列由「 ;」你需要第一列作爲列表? – Sanju