我一直在努力解決這個問題,時間長了我自己並且失去了生產力。Python:拉動後重新塑造JSON
我正在使用python打開一個會話與api並提取它的數據。該URL的格式爲
http://api.kivaws.org/v1/teams/2/loans.json
其中在路徑中的「2」佔位符表示球隊ID和加載頁面都是一個團隊取得了貸款。不要擔心這意味着什麼;只知道我的代碼修改了這個URL來遍歷團隊。事實上,這裏是代碼;
import urllib.request as urllib
import json
import time
team_loans = {}
url = "http://api.kivaws.org/v1/teams/"
#Teams ultimately 1- 11885
for i in range(1, 4):
params = dict(
id = i
)
try:
handle = urllib.urlopen(str(url+str(i)+"/loans.json"))
except:
print("Could not handle url")
continue
# reading response
item_html = handle.read().decode('utf-8')
# converting bytes to str
data = str(item_html)
# converting to json
data = json.loads(data)
# getting number of pages to crawl
numPages = data['paging']['pages']
# deleting paging data
data.pop('paging')
#Put these items in a list and iterate through indices for boolean check
#data is a dictionary, with a list object inside for each team
for item in data['loans']:
del item['name']
del item['lender_count']
del item['loan_amount']
del item['sector']
del item['description']
del item['status']
del item['funded_amount']
del item['image']
del item['activity']
del item['use']
del item['location']
del item['posted_date']
del item['borrower_count']
del item['bonus_credit_eligibility']
del item['tags']
try:
del item['basket_amount']
except:
pass
try:
del item['planned_expiration_date']
except:
pass
try:
del item['themes']
except:
pass
try:
del item['currency_exchange_loss_amount']
except:
pass
try:
del item['video']
except:
pass
item['team_id'] = i
#More than one page
if numPages > 1:
for pa in range(2,numPages + 1,1):
handle = urllib.urlopen(str(url+str(i)+"/loans.json?page="+str(pa)))
print("Pulling loan data from team " + str(i) + "...")
# reading response
item_html = handle.read().decode('utf-8')
# converting bytes to str
datatemp = str(item_html)
# converting to json
datatemp = json.loads(datatemp)
#print(datatemp)
datatemp.pop('paging')
#Put these items in a list and iterate through indices for boolean check
for item in datatemp['loans']:
del item['name']
del item['lender_count']
del item['loan_amount']
del item['sector']
del item['description']
del item['status']
del item['funded_amount']
del item['image']
del item['activity']
del item['use']
del item['location']
del item['posted_date']
del item['borrower_count']
del item['bonus_credit_eligibility']
del item['tags']
try:
del item['basket_amount']
except:
pass
try:
del item['planned_expiration_date']
except:
pass
try:
del item['themes']
except:
pass
try:
del item['currency_exchange_loss_amount']
except:
pass
try:
del item['video']
except:
pass
item['team_id'] = i
# adding data to initial list
for loan in datatemp['loans']:
data['loans'].append(loan)
time.sleep(1)
# recording loans by team in dict
team_loans[i] = data['loans']
if (data['loans']):
print("===Data added to the team_loan dictionary===")
else:
print("!!!FAILURE to add data to team_loan dictionary!!!")
# recording data to file when 10 teams are read
if int(i) % 3 == 0:
file = "data" + str(i - 3) + "-" + str(i) + ".json"
with open(file, "w") as outfile:
print("===Now writing team " + str(i) + " data to outfile===")
json.dump(team_loans, outfile, sort_keys = True, indent = 2, ensure_ascii=True)
outfile.close()
time.sleep(1)
print ('Done! Check your outfile (data'+ str(i - 3)+'_'+str(i)+'.json)')
確實是一個業餘的混亂的意大利麪代碼。基本上,api頁面中包含了很多數據,但我只需要三個元素(ID)。它在那工作。問題出在我回頭的數據結構中,這篇文章的關鍵是什麼。這是一個例子;
{
"1": [
{
"id": 434361,
"partner_id": 225,
"team_id": 1
},
{
"id": 431287,
"partner_id": 122,
"team_id": 1
}
],
"2": [
{
"id": 1164263,
"partner_id": 381,
"team_id": 2
},
{
"id": 1154377,
"partner_id": 121,
"team_id": 2
}
],
"3": [
{
"id": 1164263,
"partner_id": 381,
"team_id": 3
},
{
"id": 1154377,
"partner_id": 121,
"team_id": 3
}
]
}
爲什麼這個JSON結構是一個問題?請注意,每個團隊ID都會開始一個鍵值對列表,這些列表都在較大的JSON字典內。我不想爲每個團隊列出一個列表,我只想要包含在列表中的所有鍵值對。這是用於數據庫表格的目的。數據應該看起來像下面這樣;
{
{
"id": 434361,
"partner_id": 225,
"team_id": 1
},
{
"id": 431287,
"partner_id": 122,
"team_id": 1
},
{
"id": 1164263,
"partner_id": 381,
"team_id": 2
},
{
"id": 1154377,
"partner_id": 121,
"team_id": 2
},
{
"id": 1164263,
"partner_id": 381,
"team_id": 3
},
{
"id": 1154377,
"partner_id": 121,
"team_id": 3
}
}
用我有限的字典知識
現在,如果我要刪除這些團隊鍵(在我們的例子中,「1」,「2」和「3」),相應的清單裏面的內容也將是刪除,導致一個空的JSON字典。
這樣我已經嘗試手動刪除化妝列表(認爲一個正則表達式剝離字符串「‘77’[」以及「}],」,而且更換這些字符串用適當的字符串來保持JSON有效性)。由於顯而易見的原因,這令人頭痛。我正在處理這些數據。然而,我沒有找到另一種方式。
因爲到目前爲止我一直沒有成功。請張貼任何澄清的問題,我知道這是一個漫長的過程。謝謝
你介意解釋pprint方法的具體用法嗎?它不會按原樣編譯,我不知道該如何改變它的位置 – Typhon
這並不重要,我只是用它以某種方式格式化輸出。不知道爲什麼你不能使用它,只是做你想要的實際列表。 –
從文件讀取數據並存儲爲字符串,然後運行pprint方法並獲取; pprint.pprint(sum(data.values(),[])) AttributeError:'str'對象沒有屬性'values' – Typhon