字典在python中不給我唯一的ID

我有一個保存在文件中的elasticsearch查詢的輸出。前幾行看起來像這樣：字典在python中不給我唯一的ID

{"took": 1, 
    "timed_out": false, 
    "_shards": {}, 
    "hits": { 
     "total": 27, 
     "max_score": 6.5157733, 
     "hits": [ 
     { 
      "_index": "dbgap_062617", 
      "_type": "dataset", 
      ***"_id": "595189d15152c64c3b0adf16"***, 
      "_score": 6.5157733, 
      "_source": { 
       "dataAcquisition": { 
        "performedBy": "\n\t\tT\n\t\t" 
       }, 
       "provenance": { 
        "ingestTime": "201",      
       }, 
       "studyGroup": [ 
        { 
        "Identifier": "1", 
        "name": "Diseas" 
        } 
       ], 
       "license": { 
        "downloadURL": "http",      
       }, 
       "study": { 
        "alternateIdentifiers": "yes", 
       }, 
       "disease": { 
        "name": [ 
        "Coronary Artery Disease" 
        ] 
       }, 
       "NLP_Fields": { 
        "CellLine": [], 
        "MeshID": [ 
        "C0066533",       
        ], 
        "DiseaseID": [ 
        "C0010068" 
        ], 
        "ChemicalID": [], 
        "Disease": [ 
        "coronary artery disease" 
        ], 
        "Chemical": [], 

        "Meshterm": [ 
        "migen",       
        ] 
       }, 
       "datasetDistributions": [ 
        { 
        "dateReleased": "20150312",       
        } 
       ], 
       "dataset": { 
        "citations": [ 
        "20032323" 
        ], 
        **"description": "The Precoc.",**     
        **"title": "MIGen_ExS: PROCARDIS"** 
       }, 
       .... and the list goes on with a bunch of other items ....

從所有這些節點中，我對Unique _Ids，title和description都感興趣。於是，我創建了一本詞典，並提取了我對使用json感興趣的部分。這裏是我的代碼：

import json 
s={} 
d=open('local file','w') 
with open('localfile', 'r') as ready: 
    for line in ready: 
     test=json.loads(line, encoding='utf-8') 
     for i in (test['hits']['hits']): 
      for x in i: 
        s.setdefault(i['_id'], [i['_source']['dataset'] 
        ['description'], i['_source']['dataset']['title']]) 
     for k, v in s.items(): 
     d.write(k +'\t'+v[0] +'\t' + v[1] + '\n') 
d.close()

現在，當我運行它時，它給了我一個帶有重複_Ids的文件！不字典假設給我獨特_Ids？在我的原始輸出文件中，我有很多重複的Ids，我想擺脫它們。另外，我只在_ids上運行set（）以獲得它們的唯一編號，它達到了138.但是，如果我刪除了生成的重複ID，字典會降低到17！有人可以告訴我爲什麼會發生這種情況嗎？

來源

2017-10-04 user3026373

我可能會誤解一些東西，但你的問題似乎是「Python字典總是有獨特的ID？」答案是不。字典只是一個關聯數組，他們不知道id在其他任何字典中。 –

謝謝@Jason Fry 然後，如果我想擁有獨特的ID以及標題和描述，那麼最好的方法是什麼？有沒有可能做到這一點？ – user3026373

爲什麼一行一行地處理輸出？我想你想把整個東西看作是一個JSON對象，就像'with open（'localfile'）inp：d = json.load（inp）' – chepner

如果你想要一個唯一的ID，如果你使用的是數據庫，它會爲你創建它。如果你不是，你需要生成一個唯一的數字或字符串。根據字典的創建方式，可以使用創建字典時的時間戳，也可以使用uuid.uuid4（）。欲瞭解uuid的更多信息，請致電here are the docs。

來源

2017-10-07 22:50:53

字典在python中不給我唯一的ID

回答

相關問題