2016-08-15 206 views
-1

JSON文件我有數據幀熊貓:創建數據框中

ID, visiting 
111, 03.2015 
111, 07.2015 
111, 05.2016 
222, 12.2013 
222, 04.2016 
333, 02.2014 
333, 06.2015, 
333, 11.2015 

我需要這樣的文件(我需要指定自2013年12月至2016年6月全月)

{ 
    "111": { 
"2013-12": 0, 
"2014-01": 0, 
"2014-02": 0, 
"2014-03": 0, 
"2014-04": 0, 
"2014-05": 0, 
"2014-06": 0, 
"2014-07": 0, 
"2014-08": 0, 
"2014-09": 0, 
"2014-10": 0, 
"2014-11": 0, 
"2014-12": 0, 
"2015-01": 0, 
"2015-02": 0, 
"2015-03": 1, 
"2015-04": 0, 
"2015-05": 0, 
"2015-06": 0, 
"2015-07": 1, 
"2015-08": 0, 
"2015-09": 0, 
"2015-10": 0, 
"2015-11": 0, 
"2015-12": 0, 
"2016-01": 0, 
"2016-02": 0, 
"2016-03": 0, 
"2016-04": 0, 
"2016-05": 1, 
"2016-06": 0 
}, 
    "222": { ... 
} 
} 

哪有我從熊貓那裏得到這個?

回答

0

它會通過使用groupby小號比較有效,但能夠完成任務:

txt = """ID, visiting 
111, 03.2015 
111, 07.2015 
111, 05.2016 
222, 12.2013 
222, 04.2016 
333, 02.2014 
333, 06.2015 
333, 11.2015""" 

split = [line.split(', ') for line in txt.split('\n')] 

df = pd.DataFrame(split[1:], columns=split[0]) 

results = {} 
for ID in df.ID.unique(): 
    results[ID] = {} 

for year in range(2013, 2017): 
    if year == 2013: 
     months = [12] 
    elif year == 2016: 
     months = range(1, 7) 
    else: 
     months = range(1, 13) 
    for month in months: 
     search_text = '{:02d}.{}'.format(month, year) 
     result_text = '{}-{:02d}'.format(year, month) 
     for ID in df.ID.unique(): 
      if len(df[(df.ID == ID)&(df.visiting == search_text)]) > 0: 
       boolean = 1 
      else: 
       boolean = 0 
      results[ID][result_text] = boolean 

import json 
results = json.dumps(results)