我想知道如何從單個文件夾中讀取多個json
文件(不指定文件名,只是它們是json文件)。Python:從文件夾中讀取幾個json文件
此外,有可能將它們變成pandas
DataFrame?
你能給我一個基本的例子嗎?
我想知道如何從單個文件夾中讀取多個json
文件(不指定文件名,只是它們是json文件)。Python:從文件夾中讀取幾個json文件
此外,有可能將它們變成pandas
DataFrame?
你能給我一個基本的例子嗎?
一個選項列表與os.listdir目錄下的所有文件,然後發現只有那些在「以.json」結尾:
import os, json
import pandas as pd
path_to_json = 'somedir/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
print(json_files) # for me this prints ['foo.json']
現在你可以使用熊貓DataFrame.from_dict在JSON(讀一個Python字典在這一點上),以一個數據幀大熊貓:
montreal_json = pd.DataFrame.from_dict(many_jsons[0])
print montreal_json['features'][0]['geometry']
打印:
{u'type': u'Point', u'coordinates': [-73.6051013, 45.5115944]}
在這種情況下,我將一些jsons附加到列表many_jsons
。我列表中的第一個json實際上是一個geojson,其中包含蒙特利爾的一些地理數據。我已經熟悉了這些內容,因此我打印出了「幾何圖形」,它給了我蒙特利爾的長/短。
下面的代碼概括了一切之上:
import os, json
import pandas as pd
# this finds our json files
path_to_json = 'json/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
# here I define my pandas Dataframe with the columns I want to get from the json
jsons_data = pd.DataFrame(columns=['country', 'city', 'long/lat'])
# we need both the json and an index number so use enumerate()
for index, js in enumerate(json_files):
with open(os.path.join(path_to_json, js)) as json_file:
json_text = json.load(json_file)
# here you need to know the layout of your json and each json has to have
# the same structure (obviously not the structure I have here)
country = json_text['features'][0]['properties']['country']
city = json_text['features'][0]['properties']['name']
lonlat = json_text['features'][0]['geometry']['coordinates']
# here I push a list of data into a pandas DataFrame at row given by 'index'
jsons_data.loc[index] = [country, city, lonlat]
# now that we have the pertinent json data in our DataFrame let's look at it
print(jsons_data)
對於我這種打印:
country city long/lat
0 Canada Montreal city [-73.6051013, 45.5115944]
1 Canada Toronto [-79.3849008, 43.6529206]
這可能是讓你知道,這個代碼,我在一個目錄名有兩個geojsons「 JSON」。每個json的結構如下:
{"features":
[{"properties":
{"osm_key":"boundary","extent":
[-73.9729016,45.7047897,-73.4734865,45.4100756],
"name":"Montreal city","state":"Quebec","osm_id":1634158,
"osm_type":"R","osm_value":"administrative","country":"Canada"},
"type":"Feature","geometry":
{"type":"Point","coordinates":
[-73.6051013,45.5115944]}}],
"type":"FeatureCollection"}
要讀取JSON文件,
import os
import glob
contents = []
json_dir_name = "/path/to/json/dir"
json_pattern = os.path.join(json_dir_name,'*.json'
file_list = glob.glob(json_pattern)
for file in file_list:
contents.append(read(file))
contents.append正在創建一個字典,將所有已獲得的json文件添加到它中?謝謝@Saravana! – donpresente
'contents.append'將一個元素添加到列表'contents'中。 –
「* .json」後面應該有逗號)「 –
真的很有幫助。而不是打印我的想法是將它們全部保存到一個熊貓數據框中,應該是什麼樣的正確的代碼?創建一個空的數據框並開始向它添加行?謝謝@Scott這個詳細的答案! – donpresente
@donpresente好問題。我將發佈一個編輯來解決如何從json獲取所需的數據,然後逐行將這些數據推送到熊貓數據框中。 – Scott
@donpresente在** EDIT **下面執行了代碼幫助你? – Scott