我已經包含JSON文件中的對象如下:熊貓,如何多指數數據來自JSON的現有DataFrame。如何改變通過大熊貓JSON對象
{"v": "1","uuid": "c62f3e001c5a43d7bc663eef7db5372c","source": 3,"uniqueName": "hive","sensorId": 8324,"alarm": false,"date": 1497387606620,"movement": 49280,"rssi": 362,"lux": 16,"magnet": 16,"ageSent": 69206224,"ipAddress": "0.0.0.0","locationSensorId": 0,"locationCounter": 0,"readerId": 67,"geo": {"x": "1","y": "1","z": "1"},"sys": {},"fa": {},"requestOriginTypeId": 2,"failover": {"adv": 1,"oi": 1,"c": 1,"cr": 1},"D": "3","W": 24,"M": 5,"Y": 2017,"user": {"ui": "0","id": "0","cntry": "UK","cty": "NEWBY","gender": 0,"age": 0,"dt": 0,"scr": 0},"resp": {"rid": 67,"adv": 10000001,"oi": 1,"c": 1,"cr": 1,"p": 1.0,"b": 1.0,"curr": "£","rb": 1}}
我有一個問題是我需要訪問值的「副」:100000001點擊這裏:
"resp": {"rid": 67,"adv": 10000001,"oi": 1,"c": 1,"cr": 1,"p": 1.0,"b": 1.0,"curr": "£","rb": 1}
由於格式,我的dataFrame包含一個列「resp」的值:
{"rid": 67,"adv": 10000001,"oi": 1,"c": 1,"cr": 1,"p": 1.0,"b": 1.0,"curr": "£","rb": 1}
訪問該值的最佳方式是什麼?我正在考慮從{u'adv':1,u'cr':1,u'c':1,u'oi':1}創建一個系列(「resp」下的值)
我有另一個問題,這是我的主要問題。我從JSON創建上面,最終將只包含列
df_json = df_json[['day_time','sensor_id','customer_id','rssi','date','time']]
在此之前的一些列被重命名,這就是爲什麼你可能不會看到JSON的相關性較大的自由度。
目前的數據看起來像下面(只有day_time =日期[見第一行] /它是日期,但日期將接近DF年底):
day_time sensor_id customer_id rssi advertiser_id \
0 2017-03-17 4000068 76 352 1000001
0 2017-03-17 09:20:17.708 4000068 56 374 1000001
1 2017-03-17 09:20:42.561 4000068 60 392 1000001
0 2017-03-17 09:44:21.728 4000514 76 352 1000001
0 2017-03-17 10:32:45.227 4000461 76 332 1000001
0 2017-03-17 12:47:06.639 4000046 43 364 1000001
0 2017-03-17 12:49:34.438 4000046 62 423 1000001
0 2017-03-17 12:52:28.430 4000072 62 430 1000001
1 2017-03-17 12:52:32.593 4000072 62 394 1000001
0 2017-03-17 12:53:17.708 4000917 76 335 1000001
我需要這個df,可以將multiindexed通過day_stamp和sensor_id,這樣的數據(!請糾正我,如果我錯了),將顯示爲:
date sensor_id customer_id rssi advertiser_id \
0 2017-03-17 4000068 76 352 1000001
0 56 374 1000001
1 60 392 1000001
0 2017-03-17 4000514 76 352 1000001
0 2017-03-17 4000461 76 332 1000001
的原因我想在這個格式的數據,這樣我可以應用.diff作()函數到時間,並計算每個記錄之間每個sensor_id的時間差異。
我相信這個問題也存在。由於time.diff()將最終找出一個ID與另一個ID之間的時間差異。是否有包含diff()方法來查找具有相同sensor_id的記錄之間的時間差異?
我想再次強調我的主要問題是multiIndexing現有的DF(感覺像這裏有5個問題)。我如何輸出day_time和sensor_id作爲可用於multiIndex的有效數組?
這就產生了由day_time索引,但仍然有sensor_ids作爲混合e.g sensor_id看起來還是因爲它在我的樣本數據做。 –
也許幫助['sort_index'](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html) – jezrael
我發現一些熊貓文檔很難理解,我會做sort_index( axis = index,level =不太確定) –