2017-06-14 67 views
0

我已經包含JSON文件中的對象如下:熊貓,如何多指數數據來自JSON的現有DataFrame。如何改變通過大熊貓JSON對象

{"v": "1","uuid": "c62f3e001c5a43d7bc663eef7db5372c","source": 3,"uniqueName": "hive","sensorId": 8324,"alarm": false,"date": 1497387606620,"movement": 49280,"rssi": 362,"lux": 16,"magnet": 16,"ageSent": 69206224,"ipAddress": "0.0.0.0","locationSensorId": 0,"locationCounter": 0,"readerId": 67,"geo": {"x": "1","y": "1","z": "1"},"sys": {},"fa": {},"requestOriginTypeId": 2,"failover": {"adv": 1,"oi": 1,"c": 1,"cr": 1},"D": "3","W": 24,"M": 5,"Y": 2017,"user": {"ui": "0","id": "0","cntry": "UK","cty": "NEWBY","gender": 0,"age": 0,"dt": 0,"scr": 0},"resp": {"rid": 67,"adv": 10000001,"oi": 1,"c": 1,"cr": 1,"p": 1.0,"b": 1.0,"curr": "£","rb": 1}} 

我有一個問題是我需要訪問值的「副」:100000001點擊這裏:

"resp": {"rid": 67,"adv": 10000001,"oi": 1,"c": 1,"cr": 1,"p": 1.0,"b": 1.0,"curr": "£","rb": 1} 

由於格式,我的dataFrame包含一個列「resp」的值:

{"rid": 67,"adv": 10000001,"oi": 1,"c": 1,"cr": 1,"p": 1.0,"b": 1.0,"curr": "£","rb": 1} 

訪問該值的最佳方式是什麼?我正在考慮從{u'adv':1,u'cr':1,u'c':1,u'oi':1}創建一個系列(「resp」下的值)

我有另一個問題,這是我的主要問題。我從JSON創建上面,最終將只包含列

df_json = df_json[['day_time','sensor_id','customer_id','rssi','date','time']] 

在此之前的一些列被重命名,這就是爲什麼你可能不會看到JSON的相關性較大的自由度。

目前的數據看起來像下面(只有day_time =日期[見第一行] /它是日期,但日期將接近DF年底):

    day_time sensor_id customer_id rssi advertiser_id \ 
0 2017-03-17     4000068   76 352  1000001 
0 2017-03-17 09:20:17.708 4000068   56 374  1000001 
1 2017-03-17 09:20:42.561 4000068   60 392  1000001 
0 2017-03-17 09:44:21.728 4000514   76 352  1000001 
0 2017-03-17 10:32:45.227 4000461   76 332  1000001 
0 2017-03-17 12:47:06.639 4000046   43 364  1000001 
0 2017-03-17 12:49:34.438 4000046   62 423  1000001 
0 2017-03-17 12:52:28.430 4000072   62 430  1000001 
1 2017-03-17 12:52:32.593 4000072   62 394  1000001 
0 2017-03-17 12:53:17.708 4000917   76 335  1000001 

我需要這個df,可以將multiindexed通過day_stamp和sensor_id,這樣的數據(!請糾正我,如果我錯了),將顯示爲:

  date sensor_id customer_id rssi advertiser_id \ 
0 2017-03-17  4000068   76 352  1000001 
0         56 374  1000001 
1         60 392  1000001 
0 2017-03-17  4000514   76 352  1000001 
0 2017-03-17  4000461   76 332  1000001 

的原因我想在這個格式的數據,這樣我可以應用.diff作()函數到時間,並計算每個記錄之間每個sensor_id的時間差異。

我相信這個問題也存在。由於time.diff()將最終找出一個ID與另一個ID之間的時間差異。是否有包含diff()方法來查找具有相同sensor_id的記錄之間的時間差異?

我想再次強調我的主要問題是multiIndexing現有的DF(感覺像這裏有5個問題)。我如何輸出day_time和sensor_id作爲可用於multiIndex的有效數組?

回答

0

我想你需要:

print (df) 
        day_time sensor_id customer_id rssi advertiser_id 
0    2017-03-17 4000068   76 352  1000001 
0 2017-03-17 09:20:17.708 4000068   56 374  1000001 
1 2017-03-17 09:20:42.561 4000068   60 392  1000001 
0 2017-03-17 09:44:21.728 4000514   76 352  1000001 
0 2017-03-17 10:32:45.227 4000461   76 332  1000001 
0 2017-03-17 12:47:06.639 4000046   43 364  1000001 
0 2017-03-17 12:49:34.438 4000046   62 423  1000001 
0 2017-03-17 12:52:28.430 4000072   62 430  1000001 
1 2017-03-17 12:52:32.593 4000072   62 394  1000001 
0 2017-03-17 12:53:17.708 4000917   76 335  1000001 

df['day_time'] = pd.to_datetime(df['day_time']).dt.date 
df = df.set_index(['day_time','sensor_id']).sort_index() 
print (df) 
         customer_id rssi advertiser_id 
day_time sensor_id         
2017-03-17 4000046    43 364  1000001 
      4000046    62 423  1000001 
      4000068    76 352  1000001 
      4000068    56 374  1000001 
      4000068    60 392  1000001 
      4000072    62 430  1000001 
      4000072    62 394  1000001 
      4000461    76 332  1000001 
      4000514    76 352  1000001 
      4000917    76 335  1000001 
+0

這就產生了由day_time索引,但仍然有sensor_ids作爲混合e.g sensor_id看起來還是因爲它在我的樣本數據做。 –

+0

也許幫助['sort_index'](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html) – jezrael

+0

我發現一些熊貓文檔很難理解,我會做sort_index( axis = index,level =不太確定) –

0

如果您只是需要的不同,你可以使用變換。像

df['time_diff'] = df.groupby(('date', 'sensor_id')).transform('diff')['day_time'] 

應該工作

+0

這實際上是我做過的老方法,但我對變換('差異')感興趣。 也有另一個原因GROUPBY不是有用的我就不能在這一點上記:( –

+0

我發現這個[博客文章(http://pbpython.com/pandas_transform.html)約'很受啓發transform' –

+0

當我看到它是如何工作我非常高興的圖片!我是正確地說,這正是我需要的?索引DF正確變換允許被應用到同一sensor_ids和習慣的羣體差異方法發現兩個不同的ID?之間的時間差 –