Python：拉動後重新塑造JSON

我一直在努力解決這個問題，時間長了我自己並且失去了生產力。Python：拉動後重新塑造JSON

我正在使用python打開一個會話與api並提取它的數據。該URL的格式爲

http://api.kivaws.org/v1/teams/2/loans.json

其中在路徑中的「2」佔位符表示球隊ID和加載頁面都是一個團隊取得了貸款。不要擔心這意味着什麼;只知道我的代碼修改了這個URL來遍歷團隊。事實上，這裏是代碼;

import urllib.request as urllib 
import json 
import time 

team_loans = {} 

url = "http://api.kivaws.org/v1/teams/" 
#Teams ultimately 1- 11885 
for i in range(1, 4): 
params = dict(
    id = i 
) 

try: 
    handle = urllib.urlopen(str(url+str(i)+"/loans.json")) 
except: 
    print("Could not handle url") 
    continue 
# reading response 
item_html = handle.read().decode('utf-8') 
# converting bytes to str 
data = str(item_html) 
# converting to json 
data = json.loads(data) 
# getting number of pages to crawl 
numPages = data['paging']['pages'] 
# deleting paging data 
data.pop('paging') 
#Put these items in a list and iterate through indices for boolean check 
#data is a dictionary, with a list object inside for each team 
for item in data['loans']: 
     del item['name'] 
     del item['lender_count'] 
     del item['loan_amount'] 
     del item['sector'] 
     del item['description'] 
     del item['status'] 
     del item['funded_amount'] 
     del item['image'] 
     del item['activity'] 
     del item['use'] 
     del item['location'] 
     del item['posted_date'] 
     del item['borrower_count'] 
     del item['bonus_credit_eligibility'] 
     del item['tags'] 
     try: 
      del item['basket_amount'] 
     except: 
      pass 
     try: 
      del item['planned_expiration_date'] 
     except: 
      pass 
     try: 
      del item['themes'] 
     except: 
      pass 
     try: 
      del item['currency_exchange_loss_amount'] 
     except: 
      pass 
     try: 
      del item['video'] 
     except: 
      pass 
     item['team_id'] = i 

#More than one page 
if numPages > 1: 
    for pa in range(2,numPages + 1,1): 
     handle = urllib.urlopen(str(url+str(i)+"/loans.json?page="+str(pa))) 
     print("Pulling loan data from team " + str(i) + "...") 
     # reading response 
     item_html = handle.read().decode('utf-8') 
     # converting bytes to str 
     datatemp = str(item_html) 
     # converting to json 
     datatemp = json.loads(datatemp) 
     #print(datatemp) 
     datatemp.pop('paging') 
     #Put these items in a list and iterate through indices for boolean check 
     for item in datatemp['loans']: 
       del item['name'] 
       del item['lender_count'] 
       del item['loan_amount'] 
       del item['sector'] 
       del item['description'] 
       del item['status'] 
       del item['funded_amount'] 
       del item['image'] 
       del item['activity'] 
       del item['use'] 
       del item['location'] 
       del item['posted_date'] 
       del item['borrower_count'] 
       del item['bonus_credit_eligibility'] 
       del item['tags'] 
       try: 
        del item['basket_amount'] 
       except: 
        pass 
       try: 
        del item['planned_expiration_date'] 
       except: 
        pass 
       try: 
        del item['themes'] 
       except: 
        pass 
       try: 
        del item['currency_exchange_loss_amount'] 
       except: 
        pass 
       try: 
        del item['video'] 
       except: 
        pass 
       item['team_id'] = i 

     # adding data to initial list 
     for loan in datatemp['loans']: 
      data['loans'].append(loan) 
     time.sleep(1) 

# recording loans by team in dict 
team_loans[i] = data['loans'] 
if (data['loans']): 
    print("===Data added to the team_loan dictionary===") 
else: 
    print("!!!FAILURE to add data to team_loan dictionary!!!") 
# recording data to file when 10 teams are read 
if int(i) % 3 == 0: 
    file = "data" + str(i - 3) + "-" + str(i) + ".json" 
    with open(file, "w") as outfile: 
     print("===Now writing team " + str(i) + " data to outfile===") 
     json.dump(team_loans, outfile, sort_keys = True, indent = 2, ensure_ascii=True) 
     outfile.close() 

time.sleep(1) 

print ('Done! Check your outfile (data'+ str(i - 3)+'_'+str(i)+'.json)')

確實是一個業餘的混亂的意大利麪代碼。基本上，api頁面中包含了很多數據，但我只需要三個元素（ID）。它在那工作。問題出在我回頭的數據結構中，這篇文章的關鍵是什麼。這是一個例子;

{ 
     "1": [ 
     { 
      "id": 434361, 
      "partner_id": 225, 
      "team_id": 1 
     }, 
     { 
      "id": 431287, 
      "partner_id": 122, 
      "team_id": 1 
     } 
     ], 
     "2": [ 
     { 
      "id": 1164263, 
      "partner_id": 381, 
      "team_id": 2 
     }, 
     { 
      "id": 1154377, 
      "partner_id": 121, 
      "team_id": 2 
     } 
     ], 
     "3": [ 
     { 
      "id": 1164263, 
      "partner_id": 381, 
      "team_id": 3 
     }, 
     { 
      "id": 1154377, 
      "partner_id": 121, 
      "team_id": 3 
     } 
     ] 
    }

爲什麼這個JSON結構是一個問題？請注意，每個團隊ID都會開始一個鍵值對列表，這些列表都在較大的JSON字典內。我不想爲每個團隊列出一個列表，我只想要包含在列表中的所有鍵值對。這是用於數據庫表格的目的。數據應該看起來像下面這樣;

{ 
    { 
    "id": 434361, 
    "partner_id": 225, 
    "team_id": 1 
    }, 
    { 
    "id": 431287, 
    "partner_id": 122, 
    "team_id": 1 
    }, 
    { 
    "id": 1164263, 
    "partner_id": 381, 
    "team_id": 2 
    }, 
    { 
    "id": 1154377, 
    "partner_id": 121, 
    "team_id": 2 
    }, 
    { 
    "id": 1164263, 
    "partner_id": 381, 
    "team_id": 3 
    }, 
    { 
    "id": 1154377, 
    "partner_id": 121, 
    "team_id": 3 
    } 
}

用我有限的字典知識

現在，如果我要刪除這些團隊鍵（在我們的例子中，「1」，「2」和「3」），相應的清單裏面的內容也將是刪除，導致一個空的JSON字典。

這樣我已經嘗試手動刪除化妝列表（認爲一個正則表達式剝離字符串「‘77’[」以及「}]，」，而且更換這些字符串用適當的字符串來保持JSON有效性）。由於顯而易見的原因，這令人頭痛。我正在處理這些數據。然而，我沒有找到另一種方式。

因爲到目前爲止我一直沒有成功。請張貼任何澄清的問題，我知道這是一個漫長的過程。謝謝

來源

2016-10-28 Typhon

剛剛獲得的價值和展平，然後：

import pprint 

data = {'1': [{'id': 434361, 'partner_id': 225, 'team_id': 1}, 
     {'id': 431287, 'partner_id': 122, 'team_id': 1}], 
'2': [{'id': 1164263, 'partner_id': 381, 'team_id': 2}, 
     {'id': 1154377, 'partner_id': 121, 'team_id': 2}], 
'3': [{'id': 1164263, 'partner_id': 381, 'team_id': 3}, 
     {'id': 1154377, 'partner_id': 121, 'team_id': 3}]} 

pprint.pprint(sum(data.values(), []))

輸出：

[{'id': 1164263, 'partner_id': 381, 'team_id': 3}, 
{'id': 1154377, 'partner_id': 121, 'team_id': 3}, 
{'id': 1164263, 'partner_id': 381, 'team_id': 2}, 
{'id': 1154377, 'partner_id': 121, 'team_id': 2}, 
{'id': 434361, 'partner_id': 225, 'team_id': 1}, 
{'id': 431287, 'partner_id': 122, 'team_id': 1}]

注意這是一個列表。你最後輸出的大括號將是一個在這裏不易使用的集合（字典不可哈希），並且可能不會有用。

來源

2016-10-28 21:36:09

你介意解釋pprint方法的具體用法嗎？它不會按原樣編譯，我不知道該如何改變它的位置 – Typhon

這並不重要，我只是用它以某種方式格式化輸出。不知道爲什麼你不能使用它，只是做你想要的實際列表。 –

從文件讀取數據並存儲爲字符串，然後運行pprint方法並獲取; pprint.pprint（sum（data.values（），[]）） AttributeError：'str'對象沒有屬性'values' – Typhon

Python：拉動後重新塑造JSON

回答

相關問題