2013-07-03 112 views
0

我在寫一個應該刪除重複條目的腳本。數據中的一些人已經輸入了兩次他們的名字,因爲他們有兩個電話號碼,並且由於電話號碼字段不是數組,所以輸入多個時,他們輸入了多個條目。如何處理「Nonetype」錯誤?

我的腳本使用與列名相對應的鍵將條目更改爲詞典,然後遍歷每一行。有一個主循環遍歷每一行,然後是一個嵌套for循環,遍歷每個元素的所有元素,比較它們以檢測重複。當我點擊一個副本時,我的代碼應該比較手機,電子郵件和網站,然後將它們附加到某個區域(如果它們是唯一/不匹配的)。

這裏是我的代碼:

import csv 

# This function takes a tab-delim csv and merges the ones with the same name but different phone/email/websites. 
def merge_duplicates(sheet): 

    myjson = [] # myjson = list of dictionaries where each dictionary 

    with(open("ieca_first_col_fake_text.txt", "rU")) as f: 

     sheet = csv.DictReader(f,delimiter="\t") 
     for row in sheet: 
      myjson.append(row) 

     write_file = csv.DictWriter(open('duplicates_deleted.csv','w'), ['name','phone','email','website'], restval='', delimiter = '\t') 

     for row in myjson: 

      # convert phone, email, and web to lists so that extra can be appended 
      row['phone'] = row['phone'].split() 
      row['email'] = row['email'].split() 
      row['website'] = row['website'].split() 
      print row 

     for i in len(myjson): 

      # if the names match, check to see if phone, em, web match. If any match, append to first row. 
      try: 
       if myjson[i]['name'] == myjson[i+1]['name']: 
        if myjson[i]['phone'] != myjson[i+1]['phone']: 
         myjson[i]['phone'].append(myjson[i+1]['phone']) 
#      if row['email'] != myjson[rowvalue+1]['email']: 
#       row['email'].append(myjson[rowvalue+1]['email']) 
#      if row['website'] != myjson[rowvalue+1]['website']: 
#       row['website'].append(myjson[rowvalue+1]['website']) 
      except IndexError: 
       print("We're at the end now") 

      write_file.writerow(row) 

merge_duplicates('ieca_first_col_fake_text.txt') 

所以一切都會在我的代碼花花公子,然後它遇到第一個副本,我得到這個錯誤:

{'website': [], 'phone': [], 'name': 'Diane Grant Albrecht M.S.', 'email': []} 
{'website': ['www.got.com'], 'phone': ['111-222-3333'], 'name': 'Lannister G. Cersei M.A.T., CEP', 'email': ['[email protected]']} 
{'website': [], 'phone': [], 'name': 'Argle D. Bargle Ed.M.', 'email': []} 
{'website': ['www.daManWithThePlan.com'], 'phone': ['000-000-1111'], 'name': 'Sam D. Man Ed.M.', 'email': ['[email protected]']} 
Traceback (most recent call last): 
    File "/Users/samuelfinegold/Documents/noodle/delete_duplicates.py", line 40, in <module> 
    merge_duplicates('ieca_first_col_fake_text.txt') 
    File "/Users/samuelfinegold/Documents/noodle/delete_duplicates.py", line 20, in merge_duplicates 
    row['email'] = row['email'].split() 
AttributeError: 'NoneType' object has no attribute 'split' 
logout 

感謝這麼多的幫助!


防爆數據,如果有幫助:

name phone email website 
Diane Grant Albrecht M.S.   
"Lannister G. Cersei M.A.T., CEP" 111-222-3333 [email protected] www.got.com 
Argle D. Bargle Ed.M.   
Sam D. Man Ed.M. 000-000-1111 [email protected] www.daManWithThePlan.com 
Sam D. Man Ed.M.  
Sam D. Man Ed.M. 111-222-333  [email protected] www.daManWithThePlan.com 
D G Bamf M.S.   
Amy Tramy Lamy Ph.D.    

回答

3

的錯誤,如果row['phone']None,你不能把它分解。

你可以做到這一點

row['phone'] = row['phone'].split() if row['phone'] else [] 
row['email'] = row['email'].split() if row['email'] else [] 
row['website'] = row['website'].split() if row['website'] else [] 

[]可以通過要指定任何默認值取代(例如:None"")。

一個清潔的方法是

row['phone'] = row['phone'].split() if row.get('phone') else [] 
row['email'] = row['email'].split() if row.get('email') else [] 
row['website'] = row['website'].split() if row.get('website') else [] 
+0

只是檢查以確保,但我不知道這是錯誤,因爲輸出顯示有條目沒有電話號碼。但錯誤不會發生。如果有幫助,我已經用數據更新了這篇文章。 – goldisfine

+1

你的棧跟蹤清楚地顯示了'row''email'] = row ['email']。split()'這意味着'''''''''''''沒有' – karthikr

+1

也沒有''''和'None'一樣。所以,空白電話號碼不一定意味着它是'沒有' – karthikr

1

就個人而言,我會用and做到這一點:

row['email'] = row.get('email',[]) and row['email'].split() 

的邏輯是一樣的:

if row.get('email'): 
    row['email'] = row['email'].split() 

雖然嚴格說話時,如果鑰匙丟失(或者電子郵件已經被製成一個列表),它會重新分配,所以你可能會這樣做想要這樣做:

​​