0
我有一個保存在文件中的elasticsearch查詢的輸出。前幾行看起來像這樣:字典在python中不給我唯一的ID
{"took": 1,
"timed_out": false,
"_shards": {},
"hits": {
"total": 27,
"max_score": 6.5157733,
"hits": [
{
"_index": "dbgap_062617",
"_type": "dataset",
***"_id": "595189d15152c64c3b0adf16"***,
"_score": 6.5157733,
"_source": {
"dataAcquisition": {
"performedBy": "\n\t\tT\n\t\t"
},
"provenance": {
"ingestTime": "201",
},
"studyGroup": [
{
"Identifier": "1",
"name": "Diseas"
}
],
"license": {
"downloadURL": "http",
},
"study": {
"alternateIdentifiers": "yes",
},
"disease": {
"name": [
"Coronary Artery Disease"
]
},
"NLP_Fields": {
"CellLine": [],
"MeshID": [
"C0066533",
],
"DiseaseID": [
"C0010068"
],
"ChemicalID": [],
"Disease": [
"coronary artery disease"
],
"Chemical": [],
"Meshterm": [
"migen",
]
},
"datasetDistributions": [
{
"dateReleased": "20150312",
}
],
"dataset": {
"citations": [
"20032323"
],
**"description": "The Precoc.",**
**"title": "MIGen_ExS: PROCARDIS"**
},
.... and the list goes on with a bunch of other items ....
從所有這些節點中,我對Unique _Ids,title和description都感興趣。於是,我創建了一本詞典,並提取了我對使用json感興趣的部分。這裏是我的代碼:
import json
s={}
d=open('local file','w')
with open('localfile', 'r') as ready:
for line in ready:
test=json.loads(line, encoding='utf-8')
for i in (test['hits']['hits']):
for x in i:
s.setdefault(i['_id'], [i['_source']['dataset']
['description'], i['_source']['dataset']['title']])
for k, v in s.items():
d.write(k +'\t'+v[0] +'\t' + v[1] + '\n')
d.close()
現在,當我運行它時,它給了我一個帶有重複_Ids的文件!不字典假設給我獨特_Ids?在我的原始輸出文件中,我有很多重複的Ids,我想擺脫它們。 另外,我只在_ids上運行set()以獲得它們的唯一編號,它達到了138.但是,如果我刪除了生成的重複ID,字典會降低到17! 有人可以告訴我爲什麼會發生這種情況嗎?
我可能會誤解一些東西,但你的問題似乎是「Python字典總是有獨特的ID?」答案是不。字典只是一個關聯數組,他們不知道id在其他任何字典中。 –
謝謝@Jason Fry 然後,如果我想擁有獨特的ID以及標題和描述,那麼最好的方法是什麼?有沒有可能做到這一點? – user3026373
爲什麼一行一行地處理輸出?我想你想把整個東西看作是一個JSON對象,就像'with open('localfile')inp:d = json.load(inp)' – chepner