2017-10-17 62 views
3

我有多個JSON文件,其中包含大寫和國家。如何從所有文件中刪除重複的鍵值對?如何從多個JSON文件中刪除重複內容?

我有以下的JSON文件之一

{ 
    "data": [ 
    { 
     "Capital": "Berlin", 
     "Country": "Germany" 
    }, 
    { 
     "Capital": "New Delhi", 
     "Country": "India" 
    }, 
    { 
     "Capital": "Canberra", 
     "Country": "Australia" 
    }, 
    { 
     "Capital": "Beijing.", 
     "Country": "China" 
    }, 
    { 
     "Capital": "Tokyo", 
     "Country": "Japan" 
    }, 
    { 
     "Capital": "Tokyo", 
     "Country": "Japan" 
    }, 
    { 
     "Capital": "Berlin", 
     "Country": "Germany" 
    }, 
    { 
     "Capital": "Moscow", 
     "Country": "Russia" 
    }, 
    { 
     "Capital": "New Delhi", 
     "Country": "India" 
    }, 
    { 
     "Capital": "Ottawa", 
     "Country": "Canada" 
    } 
    ] 

} 

有包含許多這樣的JSON文件重複items.How做我刪除repetitve項目只保留第一次出現?我已經試過這一點,但沒有按不工作

dupes = [] 
for f in json_files: 
    with open(f) as json_data: 
     nations = json.load(json_data)['data'] 
     #takes care of duplicates and stores it in dupes 
     dupes.append(x for x in nations if x['Capital'] in seen or seen.add(x['Capital'])) 
     nations = [x for x in nations if x not in dupes] #want to keep the first occurance of the item present in dupes 

    with open(f, 'w') as json_data: 
     json.dump({'data': nations}, json_data) 

回答

1

列表解析是偉大的!但是......當這個過程涉及到一個if聲明時,他們可能會使代碼複雜化。

這絕不是的經驗法則。相反,我鼓勵你經常使用列表解析。在這種特殊情況下,更多的解決方案更具可讀性。

我的建議是這樣的:

import json 

seen = [] 
result = [] 

with open('data.json') as json_data: 
    nations = json.load(json_data)['data'] 
    #takes care of duplicates and stores it in dupes 
    for item in nations: 
     if item['Capital'] not in seen: 
      seen.append(item['Capital']) 
      result.append(item) 

with open('data.no_dup.json', 'w') as json_data: 
    json.dump({'data': result}, json_data) 

測試和工程上的Python 3.5.2。

請注意,爲了方便起見,我已經移除了您的外部循環。

+0

您的代碼適合我希望實現的功能。謝謝! –

0

以下是你如何能做到這一點了給定的JSON示例代碼

import json 

files = ['countries.json'] 

for f in files: 
    with open(f,'r') as fp: 
     nations = json.load(fp) 
    result = [dict(tupleized) for tupleized in set(tuple(item.items())\ 
      for item in nations['data'])] 
print result 
print len(result) 

輸出:

[{u'Country': u'Russia', u'Capital': u'Moscow'}, {u'Country': u'Japan', u'Capital': u'Tokyo'}, {u'Country': u'Canada', u'Capital': u'Ottawa'}, {u'Country': u'India', u'Capital': u'New Delhi'}, {u'Country': u'Germany', u'Capital': u'Berlin'}, {u'Country': u'Australia', u'Capital': u'Canberra'}, {u'Country': u'China', u'Capital': u'Beijing.'}] 
7 
+0

請注意,這隻會篩選出重複對,所以'{'國家':'俄羅斯','資本':'莫斯科'}和'{'國家':'扎伊爾','資本':'莫斯科'} '都將在'結果' – jpyams

2

你可能不能使用清涼列表理解,但經常循環應工作

used_nations = {} 
for nation in nations: 
    if nation['Capital'] in used_nations: 
     nations.remove(nation) 
    else: 
     used_nations.add(nation['Capital']) 
+0

這不是JS,'nation.country'不起作用。 – nutmeg64

+0

@ nutmeg64我相信有人會不久之後創建一個'python.js';) – jpyams