2017-07-28 228 views
-1

我想使用Pandas的json_normalize,但到目前爲止,我的努力只產生了錯誤。有人能告訴我我做錯了什麼嗎?我有一個複雜的嵌套JSON,我很樂意利用熊貓的強大工具來分析它。如何使用熊貓的json_normalize

代碼(當前的嘗試):

import json, pandas as pd 

from pandas.io.json import json_normalize 

df = pd.read_json('dir/data.json') 

json_normalize(df,'aaa', 'bbb') 

的錯誤已介於

TypeError: string indices must be integers 

多個KeyError: 0問題。我嘗試了多個關鍵字參數來處理使用這個函數,我試圖將數據分解成行並在規範化之前逐行重新創建它,並且我讀取了documentation for this function並將錯誤的函數與錯誤組合在一起,得到。所有都失敗了。我懷疑這可能是由於data.json的性質相當複雜。我可以使用其他方法,但它們非常耗時。

關於格式化的道歉,這是我第一個問題。對於誰與建設性的反饋意見作出迴應的真棒人,這裏是從我的數據文件的中間採取了幾行字:

{"_id" : { "$oid" : "52b213b38594d8a2be17c789" }, "approvalfy" : "2014", "board_approval_month" : "October", "boardapprovaldate" : "2013-10-29T00:00:00Z", "borrower" : "THE KINGDOM OF MOROCCO", "closingdate" : "2014-12-31T00:00:00Z", "country_namecode" : "Kingdom of Morocco!$!MA", "countrycode" : "MA", "countryname" : "Kingdom of Morocco", "countryshortname" : "Morocco", "docty" : "Program Document,Project Information Document,Project Information Document", "grantamt" : 0, "ibrdcommamt" : 200000000, "id" : "P130903", "idacommamt" : 0, "impagency" : "MINISTRY OF FINANCE", "lendinginstr" : "Development Policy Lending", "lendinginstrtype" : "AD", "lendprojectcost" : 200000000, "majorsector_percent" : [ { "Name" : "Public Administration, Law, and Justice", "Percent" : 34 }, { "Name" : "Public Administration, Law, and Justice", "Percent" : 33 }, { "Name" : "Public Administration, Law, and Justice", "Percent" : 33 } ], "mjsector_namecode" : [ { "name" : "Public Administration, Law, and Justice", "code" : "BX" }, { "name" : "Public Administration, Law, and Justice", "code" : "BX" }, { "name" : "Public Administration, Law, and Justice", "code" : "BX" } ], "mjtheme" : [ "Public sector governance", "Public sector governance", "Public sector governance" ], "mjtheme_namecode" : [ { "name" : "Public sector governance", "code" : "2" }, { "name" : "Public sector governance", "code" : "2" }, { "name" : "Public sector governance", "code" : "2" } ], "mjthemecode" : "2,2,2", "prodline" : "PE", "prodlinetext" : "IBRD/IDA", "productlinetype" : "L", "project_abstract" : { "cdata" : "The objective of this First Transparency and Accountability Development Policy Loan (DPL) Program for Morocco is to support the concretization of key new constitutional governance principles and rights, aimed at increasing transparency and accountability and enhancing citizen engagement and access to information. The series supports structural reforms strengthening economic governance across the public sector and new policies fostering more inclusive and open governance. The DPL has been prepared jointly with the European Union (EU) and the African Development Bank (AfDB), leveraging a further US$ 250 million in support of common key policy actions such as the budget, procurement and open governance reforms. The programmatic approach is warranted by the scope and depth of the government's governance reform program, the implementation of which will require time, assistance, and flexibility. This operation is complemented by the transition fund project supporting the implementation of Morocco's new governance framework. This US$ 4 million grant provides technical assistance for the implementation of structural reforms fostering public engagement; performance based budgeting and fiscal decentralization. The series adopts a holistic and integrated approach to enhance its impact. It is supporting governance reforms across the public sector covering the central government; State owned Enterprises, or SoEs and agencies, local governments as well as inter-governmental relations. The Bank has provided policy advice and technical assistance for the design of most policy measures and laws supported by this DPL, with the support from the MNA multi-donor trust fund. The transition fund governance project will support the implementation of these structural reforms. While building on the long-standing engagement with public administration reform, under the Public Administration Reform Loan (PARL) series, this program supports the concretization of the performance budgeting reform through the adoption and implementation of the new organic budget law and procurement decree. This DPL series also delves into new reform areas derived from the constitution such as access to information, public petitions, as well as into the governance of SoEs and local finances." }, "project_name" : "MA Accountability and Transparency DPL", "projectdocs" : [ { "DocTypeDesc" : "Program Document (PGD), Vol.1 of 1", "DocType" : "PGD", "EntityID" : "000333037_20131009170139", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20131009170139", "DocDate" : "30-SEP-2013" }, { "DocTypeDesc" : "Project Information Document (PID), Vol.1 of 1", "DocType" : "PID", "EntityID" : "000231615_20121031105539", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000231615_20121031105539", "DocDate" : "04-SEP-2012" }, { "DocTypeDesc" : "Project Information Document (PID), Vol.1 of 1", "DocType" : "PID", "EntityID" : "000386194_20121016015521", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000386194_20121016015521", "DocDate" : "04-SEP-2012" } ], "projectfinancialtype" : "IBRD", "projectstatusdisplay" : "Active", "regionname" : "Middle East and North Africa", "sector" : [ { "Name" : "General public administration sector" }, { "Name" : "Central government administration" }, { "Name" : "Public administration- Information and communications" } ], "sector1" : { "Name" : "General public administration sector", "Percent" : 34 }, "sector2" : { "Name" : "Central government administration", "Percent" : 33 }, "sector3" : { "Name" : "Public administration- Information and communications", "Percent" : 33 }, "sector_namecode" : [ { "name" : "General public administration sector", "code" : "BZ" }, { "name" : "Central government administration", "code" : "BC" }, { "name" : "Public administration- Information and communications", "code" : "BM" } ], "sectorcode" : "BM,BC,BZ", "source" : "IBRD", "status" : "Active", "supplementprojectflg" : "N", "theme1" : { "Name" : "Other accountability/anti-corruption", "Percent" : 33 }, "theme_namecode" : [ { "name" : "Other accountability/anti-corruption", "code" : "29" }, { "name" : "Other public sector governance", "code" : "30" }, { "name" : "Public expenditure, financial management and procurement", "code" : "27" } ], "themecode" : "27,30,29", "totalamt" : 200000000, "totalcommamt" : 200000000, "url" : "http://www.worldbank.org/projects/P130903?lang=en" } 
{ "_id" : { "$oid" : "52b213b38594d8a2be17c78a" }, "approvalfy" : "2014", "board_approval_month" : "October", "boardapprovaldate" : "2013-10-25T00:00:00Z", "borrower" : "GOVERNMENT OF SOUTH SUDAN", "country_namecode" : "Republic of South Sudan!$!SS", "countrycode" : "SS", "countryname" : "Republic of South Sudan", "countryshortname" : "South Sudan", "docty" : "Project Paper,Project Information Document", "envassesmentcategorycode" : "B", "grantamt" : 7530000, "ibrdcommamt" : 0, "id" : "P145339", "idacommamt" : 0, "impagency" : "MINISTRY OF AGRICULTURE, COOPERATIVES AND RURAL DEVELOPMENT", "lendinginstr" : "Specific Investment Loan", "lendinginstrtype" : "IN", "lendprojectcost" : 7530000, "majorsector_percent" : [ { "Name" : "Agriculture, fishing, and forestry", "Percent" : 50 }, { "Name" : "Health and other social services", "Percent" : 30 }, { "Name" : "Agriculture, fishing, and forestry", "Percent" : 20 } ], "mjsector_namecode" : [ { "name" : "Agriculture, fishing, and forestry", "code" : "AX" }, { "name" : "Health and other social services", "code" : "JX" }, { "name" : "Agriculture, fishing, and forestry", "code" : "AX" } ], "mjtheme" : [ "Rural development" ], "mjtheme_namecode" : [ { "name" : "Rural development", "code" : "10" }, { "name" : "", "code" : "2" } ], "mjthemecode" : "10,2", "prodline" : "RE", "prodlinetext" : "Recipient Executed Activities", "productlinetype" : "L", "project_abstract" : { "cdata" : "The development objective of the Additional Financing (AF) for the Emergency Food Crisis Response Project for South Sudan is to support adoption of improved technologies for food production by eligible beneficiaries, increase storage capacity for staples, and provide cash or food to eligible people participating in public works programs in selected counties in South Sudan. This is the third AF to the project and will be primarily used to scale-up and augment benefits to already participating beneficiaries and to expand project activities to four additional counties where recent monitoring points to significantly deteriorating food security. The AF will cover the costs associated with: (i) provision of agricultural inputs, production technology, and advisory services; (ii) rehabilitating a seed processing facility to increase farmer's access to improved seed; (iii) bringing land that is currently out of production back into production; (iv) training farmers on reduction of postharvest losses; (v) building of food storage capacity to support postharvest handling at the household and community levels; and (vi) provision of cash or food for work to eligible individuals. The implementation schedule will be slightly revised and the closing date of both the original project and AF will be extended to April 30, 2015." }, "project_name" : "Southern Sudan Emergency Food Crisis Response Project- AF III", "projectdocs" : [ { "DocTypeDesc" : "Project Paper (PJPR), Vol.1 of 1", "DocType" : "PJPR", "EntityID" : "000442464_20131009102446", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000442464_20131009102446", "DocDate" : "01-OCT-2013" }, { "DocTypeDesc" : "Project Information Document (PID), Vol.1 of 1", "DocType" : "PID", "EntityID" : "000001843_20130618091419", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000001843_20130618091419", "DocDate" : "07-JUN-2013" } ], "projectfinancialtype" : "OTHER", "projectstatusdisplay" : "Active", "regionname" : "Africa", "sector" : [ { "Name" : "Crops" }, { "Name" : "Other social services" }, { "Name" : "General agriculture, fishing and forestry sector" } ], "sector1" : { "Name" : "Crops", "Percent" : 50 }, "sector2" : { "Name" : "Other social services", "Percent" : 30 }, "sector3" : { "Name" : "General agriculture, fishing and forestry sector", "Percent" : 20 }, "sector_namecode" : [ { "name" : "Crops", "code" : "AH" }, { "name" : "Other social services", "code" : "JB" }, { "name" : "General agriculture, fishing and forestry sector", "code" : "AZ" } ], "sectorcode" : "AZ,JB,AH", "source" : "IBRD", "status" : "Active", "supplementprojectflg" : "Y", "theme1" : { "Name" : "Global food crisis response", "Percent" : 100 }, "theme_namecode" : [ { "name" : "Global food crisis response", "code" : "91" } ], "themecode" : "91", "totalamt" : 0, "totalcommamt" : 7530000, "url" : "http://www.worldbank.org/projects/P145339?lang=en" } 
{ "_id" : { "$oid" : "52b213b38594d8a2be17c78b" }, "approvalfy" : "2014", "board_approval_month" : "October", "boardapprovaldate" : "2013-10-25T00:00:00Z", "closingdate" : "2017-12-31T00:00:00Z", "country_namecode" : "Republic of India!$!IN", "countrycode" : "IN", "countryname" : "Republic of India", "countryshortname" : "India", "docty" : "Project Appraisal Document,Environmental Assessment,Project Information Document,Integrated Safeguards Data Sheet,Working Paper", "envassesmentcategorycode" : "B", "grantamt" : 0, "ibrdcommamt" : 0, "id" : "P146653", "idacommamt" : 250000000, "lendinginstr" : "Investment Project Financing", "lendinginstrtype" : "IN", "lendprojectcost" : 250000000, "majorsector_percent" : [ { "Name" : "Transportation", "Percent" : 60 }, { "Name" : "Water, sanitation and flood protection", "Percent" : 25 }, { "Name" : "Industry and trade", "Percent" : 10 }, { "Name" : "Health and other social services", "Percent" : 5 } ], "mjsector_namecode" : [ { "name" : "Transportation", "code" : "TX" }, { "name" : "Water, sanitation and flood protection", "code" : "WX" }, { "name" : "Industry and trade", "code" : "YX" }, { "name" : "Health and other social services", "code" : "JX" } ], "mjtheme" : [ "Rural development", "Social protection and risk management", "Social protection and risk management", "Environment and natural resources management" ], "mjtheme_namecode" : [ { "name" : "Rural development", "code" : "10" }, { "name" : "Social protection and risk management", "code" : "6" }, { "name" : "Social protection and risk management", "code" : "6" }, { "name" : "Environment and natural resources management", "code" : "11" } ], "mjthemecode" : "10,6,6,11", "prodline" : "PE", "prodlinetext" : "IBRD/IDA", "productlinetype" : "L", "project_abstract" : { "cdata" : "The objective of the Uttarakhand Disaster Recovery Project for India is to restore housing, rural connectivity and build resilience of communities in Uttarakhand and increase the technical capacity of the state entities to respond promptly and effectively to an eligible crisis or emergency. There are six components to the project, the first component being resilient infrastructure reconstruction. The objective of this component is to focus on the immediate needs of reconstruction of damaged houses and public buildings. The aim is to reduce the vulnerability of the affected population and restore access to the basic services of governance. The second component is the rural road connectivity. The objective of this component is to restore the connectivity lost due to the disaster through the reconstruction of damaged roads and bridges including: village roads, Other District Roads (ODRs), bridle roads and bridle bridges. The third component is the technical assistance and capacity building for disaster risk management. The objective of this component is to enhance the capabilities of government entities and others in risk mitigation and response. The fourth component is the financing disaster response expenses. This component will support the financing of eligible expenses already incurred by the state during the immediate post-disaster response period. The fifth component is the implementation support. This component will support the incremental operating costs of the project, including the operation of the Project Management Unit (PMU) and the respective Project Implementation Units (PIUs). Finally, the sixth component is the contingency emergency response." }, "project_name" : "Uttarakhand Disaster Recovery Project", "projectdocs" : [ { "DocTypeDesc" : "Project Appraisal Document (PAD), Vol.1 of 1", "DocType" : "PAD", "EntityID" : "000333037_20131021112627", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20131021112627", "DocDate" : "11-OCT-2013" }, { "DocTypeDesc" : "Environmental Assessment (EA), Vol.1 of 1", "DocType" : "EA", "EntityID" : "000442464_20131015112514", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000442464_20131015112514", "DocDate" : "10-OCT-2013" }, { "DocTypeDesc" : "Project Information Document (PID), Vol.1 of 1", "DocType" : "PID", "EntityID" : "000356161_20130926131319", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000356161_20130926131319", "DocDate" : "24-SEP-2013" }, { "DocTypeDesc" : "Integrated Safeguards Data Sheet (ISDS), Vol.1 of 1", "DocType" : "ISDS", "EntityID" : "000333037_20130926120720", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20130926120720", "DocDate" : "24-SEP-2013" }, { "DocTypeDesc" : "Working Paper (WP), Vol.1 of 1", "DocType" : "WP", "EntityID" : "000333037_20131115110208", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20131115110208", "DocDate" : "01-JUN-2013" } ], "projectfinancialtype" : "IDA", "projectstatusdisplay" : "Active", "regionname" : "South Asia", "sector" : [ { "Name" : "Rural and Inter-Urban Roads and Highways" }, { "Name" : "Flood protection" }, { "Name" : "Housing construction" }, { "Name" : "Other social services" } ], "sector1" : { "Name" : "Rural and Inter-Urban Roads and Highways", "Percent" : 60 }, "sector2" : { "Name" : "Flood protection", "Percent" : 25 }, "sector3" : { "Name" : "Housing construction", "Percent" : 10 }, "sector4" : { "Name" : "Other social services", "Percent" : 5 }, "sector_namecode" : [ { "name" : "Rural and Inter-Urban Roads and Highways", "code" : "TI" }, { "name" : "Flood protection", "code" : "WD" }, { "name" : "Housing construction", "code" : "YC" }, { "name" : "Other social services", "code" : "JB" } ], "sectorcode" : "JB,YC,WD,TI", "source" : "IBRD", "status" : "Active", "supplementprojectflg" : "N", "theme1" : { "Name" : "Rural services and infrastructure", "Percent" : 60 }, "theme_namecode" : [ { "name" : "Rural services and infrastructure", "code" : "78" }, { "name" : "Natural disaster management", "code" : "52" }, { "name" : "Social risk mitigation", "code" : "87" }, { "name" : "Climate change", "code" : "81" } ], "themecode" : "81,87,52,78", "totalamt" : 250000000, "totalcommamt" : 250000000, "url" : "http://www.worldbank.org/projects/P146653?lang=en" } 

值得一提的是,每一行中不是所有的領域都有有效的信息,我應該是找出並糾正這個問題。我不想要這個答案,我只是想知道如何使用json_normalize來獲取熊貓數據框的信息。

+1

你可以添加你的'data.json' – Dark

+0

如果您需要任何更多的信息,讓我知道一個片段。我仍然沒有得到這個平方。 –

回答

0

這爲我工作:

  1. 逐行讀取數據線作爲字符串(複製粘貼文本到一個文件)

  2. 使用JSON函數中的每個字符串轉換爲Python字典。

  3. 使用pandas json_normalize將每個字典轉換爲一行DF,如果需要,連接所有DF。

    import pandas as pd 
    from pandas.io.json import json_normalize 
    import json 
    
    with open('data.json', 'r') as f: # 'data.json' is the name of the file 
        data = f.readlines() 
    
    pd.concat([json_normalize(json.loads(j)) for j in data])