我正在編寫一個應該刪除重複條目的腳本。數據中的一些人已經輸入了兩次他們的名字,因爲他們有兩個電話號碼,並且由於電話號碼字段不是數組,所以輸入多個時,他們輸入了多個條目。腳本只打印出最終條目而不是刪除重複條目
我的腳本使用與列名相對應的鍵將條目更改爲詞典,然後遍歷每一行。有一個主循環遍歷每一行,然後是一個嵌套for循環,遍歷每個元素的所有元素,比較它們以檢測重複。當我點擊一個副本時,我的代碼應該比較手機,電子郵件和網站,然後將它們附加到某個區域(如果它們是唯一/不匹配的)。
該腳本運行,但它返回的csv充滿了csv中最後一個人重複8次而沒有別的。
這裏是我的代碼:(!不是一個真正的人......組成)
import csv
# This function takes a tab-delim csv and merges the ones with the same name but different phone/email/websites.
def merge_duplicates(sheet):
myjson = [] # myjson = list of dictionaries where each dictionary
with(open("ieca_first_col_fake_text.txt", "rU")) as f:
sheet = csv.DictReader(f,delimiter="\t")
for row in sheet:
myjson.append(row)
write_file = csv.DictWriter(open('duplicates_deleted.csv','w'), ['name','phone','email','website'], restval='', delimiter = '\t')
for row in myjson:
# convert phone, email, and web to lists so that extra can be appended
row['phone'] = row['phone'].split() if row.get('phone') else []
row['email'] = row['email'].split() if row.get('email') else []
row['website'] = row['website'].split() if row.get('website') else []
print row
i = 0
for i in range(len(myjson)):
# if the names match, check to see if phone, em, web match. If any match, append to first row.
try:
print 'trying'
if myjson[i]['name'] == myjson[i+1]['name']:
if myjson[i]['phone'] != myjson[i+1]['phone']:
print 'detected'
myjson[i]['phone'].append(myjson[i+1]['phone'])
if myjson[i]['email'] != myjson[i+1]['email']:
myjson[i]['email'].append(myjson[i+1]['email'])
if myjson[i]['website'] != myjson[i+1]['website']:
myjson[i]['website'].append(myjson[i+1]['website'])
except IndexError:
print("We're at the end now")
write_file.writerow(row)
print row
merge_duplicates('ieca_first_col_fake_text.txt')
這是CSV輸出
"Amy Tramy Lamy Ph.D. [] [] []"
"Amy Tramy Lamy Ph.D. [] [] []"
"Amy Tramy Lamy Ph.D. [] [] []"
"Amy Tramy Lamy Ph.D. [] [] []"
"Amy Tramy Lamy Ph.D. [] [] []"
"Amy Tramy Lamy Ph.D. [] [] []"
"Amy Tramy Lamy Ph.D. [] [] []"
"Amy Tramy Lamy Ph.D. [] [] []"
感謝這麼多的幫助!
防爆數據,如果有幫助:
name phone email website
Diane Grant Albrecht M.S.
"Lannister G. Cersei M.A.T., CEP" 111-222-3333 [email protected] www.got.com
Argle D. Bargle Ed.M.
Sam D. Man Ed.M. 000-000-1111 [email protected] www.daManWithThePlan.com
Sam D. Man Ed.M.
Sam D. Man Ed.M. 111-222-333 [email protected] www.daManWithThePlan.com
D G Bamf M.S.
Amy Tramy Lamy Ph.D.
順便說一句,嘗試使用列表理解,即myjson = [如果您打算將迭代器打包到列表中(例如將行讀入內存中),則在csv.DictReader(f,delimiter =「\ t」)中爲行放置行。 –
在甚至相互排斥的情況下使用ifif部分中的elif(垃圾原則) –