將自定義類別分配給json數據 - pandas

將標籤分配給原始數據，而不是從get_dummies獲取新的指示符列。我想是這樣的：將自定義類別分配給json數據 - pandas

json_input：

[{ID：100，汽車類型：「汽車」，時間：「2017年4月6日1時39分43秒」，區= 「A」，類型：「Checked」}， {id：101，vehicle_type：「Truck」，time：「2017-04-06 02:35:45」，zone =「B」，type：「Unchecked」}， {id： 102，vehicle_type：「Truck」，time：「2017-04-05 03:20:12」，zone =「A」，type：「Checked」}， {id：103，vehicle_type：「Car」，time：「2017年4月4日10點05分04秒」，區= 「C」，類型：「未檢查」} ]

結果：

ID，汽車類型，列出的時間範圍，區域，類型
100，0，1，1，1
101，1，1，2，0
102，1，2，1，1
103，0，3,3，0

時間stamp- TS 列 - >汽車類型，類型是二進制的，列出的時間範圍（1 - >（TS1-TS2），2 - >（TS3-TS4）， 3 - >（TS5-TS6）），區域 - >分類（1,2或3）。我想自動分配這些標籤，當我將扁平化的json提供給熊貓中的數據框時。這可能嗎？（我不想在熊貓中使用get_dummies中的zone_1，type_1，vehicle_type_3指標列）。如果熊貓不可能，請爲這個自動化建議python lib。

來源

2017-04-10 Milee

向我們展示你的JSON和你想要的結果看起來像。 –

這是我能想出來的。我不知道你在找什麼時間範圍爲

import datetime 
import io 
import pandas as pd 
import numpy as np 
df_string='[{"id":100,"vehicle_type":"Car","time":"2017-04-06 01:39:43","zone":"A","type":"Checked"},{"id":101,"vehicle_type":"Truck","time":"2017-04-06 02:35:45","zone":"B","type":"Unchecked"},{"id":102,"vehicle_type":"Truck","time":"2017-04-05 03:20:12","zone":"A","type":"Checked"},{"id":103,"vehicle_type":"Car","time":"2017-04-04 10:05:04","zone":"C","type":"Unchecked"}]' 
df = pd.read_json(io.StringIO(df_string)) 
df['zone'] = pd.Categorical(df.zone) 
df['vehicle_type'] = pd.Categorical(df.vehicle_type) 
df['type'] = pd.Categorical(df.type) 
df['zone_int'] = df.zone.cat.codes 
df['vehicle_type_int'] = df.vehicle_type.cat.codes 
df['type_int'] = df.type.cat.codes 
df.head()

編輯這是我能想出

import datetime 
import io 
import math 
import pandas as pd 
#Taken from http://stackoverflow.com/questions/13071384/python-ceil-a-datetime-to-next-quarter-of-an-hour 
def ceil_dt(dt, num_seconds=900): 
    nsecs = dt.minute*60 + dt.second + dt.microsecond*1e-6 
    delta = math.ceil(nsecs/num_seconds) * num_seconds - nsecs 
    return dt + datetime.timedelta(seconds=delta) 

df_string='[{"id":100,"vehicle_type":"Car","time":"2017-04-06 01:39:43","zone":"A","type":"Checked"},{"id":101,"vehicle_type":"Truck","time":"2017-04-06 02:35:45","zone":"B","type":"Unchecked"},{"id":102,"vehicle_type":"Truck","time":"2017-04-05 03:20:12","zone":"A","type":"Checked"},{"id":103,"vehicle_type":"Car","time":"2017-04-04 10:05:04","zone":"C","type":"Unchecked"}]' 
df = pd.read_json(io.StringIO(df_string)) 
df['zone'] = pd.Categorical(df.zone) 
df['vehicle_type'] = pd.Categorical(df.vehicle_type) 
df['type'] = pd.Categorical(df.type) 
df['zone_int'] = df.zone.cat.codes 
df['vehicle_type_int'] = df.vehicle_type.cat.codes 
df['type_int'] = df.type.cat.codes 
df['time'] = pd.to_datetime(df.time) 
df['dayofweek'] = df.time.dt.dayofweek 
df['month_int'] = df.time.dt.month 
df['year_int'] = df.time.dt.year 
df['day'] = df.time.dt.day 
df['date'] = df.time.apply(lambda x: x.date()) 
df['month'] = df.date.apply(lambda x: datetime.date(x.year, x.month, 1)) 
df['year'] = df.date.apply(lambda x: datetime.date(x.year, 1, 1)) 
df['hour'] = df.time.dt.hour 
df['mins'] = df.time.dt.minute 
df['seconds'] = df.time.dt.second 
df['time_interval_3hour'] = df.hour.apply(lambda x : math.floor(x/3)+1) 
df['time_interval_6hour'] = df.hour.apply(lambda x : math.floor(x/6)+1) 
df['time_interval_12hour'] = df.hour.apply(lambda x : math.floor(x/12)+1) 
df['weekend'] = df.dayofweek.apply(lambda x: x>4) 

df['ceil_quarter_an_hour'] =df.time.apply(lambda x : ceil_dt(x)) 
df['ceil_half_an_hour'] =df.time.apply(lambda x : ceil_dt(x, num_seconds=1800)) 
df.head()

來源

2017-04-10 23:23:44 atkawa7

我正在尋找像當天同一小時的範圍，然後將它們分組爲一個類別。基本上也是一種基於範圍進行分類的方式 - 時間，數字。 – Milee

謝謝。完善。 – Milee

將自定義類別分配給json數據 - pandas

回答

相關問題