重新創建數據幀,通過行
In [39]: df = pd.DataFrame({"id":[ 3001,3849, 8927] , "start_date": ['1-1-2000','1-5-2001','1-6-2006'], "end_date":['1-5-2000','1-8-2001','1-9-2006']})
組索引
In [40]: df = df.set_index('id')
迭代。
In [41]: newdf = pd.DataFrame()
In [42]: for id, row in df.iterrows():
newdf = pd.concat([newdf, pd.DataFrame({"id":id, "date": pd.date_range(start=row.start_date, end=row.end_date, freq='D')}) ], ignore_index=True)
print id
....:
3001
3849
8927
In [43]: newdf = newdf.set_index('id')
In [44]: newdf
Out[44]:
date
id
3001 2000-01-01
3001 2000-01-02
3001 2000-01-03
3001 2000-01-04
3001 2000-01-05
3849 2001-01-05
3849 2001-01-06
3849 2001-01-07
3849 2001-01-08
8927 2006-01-06
8927 2006-01-07
8927 2006-01-08
8927 2006-01-09
並完成。
目前還不清楚你的日期格式,是否是日期優先?或以月爲先? 您可以點擊此處查看:Specifying date format when converting with pandas.to_datetime
肯定,編輯額外的答案:
In [32]: b = newdf.reset_index().groupby('id').date.transform(
lambda ii : ii.max())
In [33]: b
Out[33]:
0 2000-01-05
1 2000-01-05
2 2000-01-05
3 2000-01-05
4 2000-01-05
5 2001-01-08
6 2001-01-08
7 2001-01-08
8 2001-01-08
9 2006-01-09
10 2006-01-09
11 2006-01-09
12 2006-01-09
Name: date, dtype: datetime64[ns]
In [37]: newdf['new_col'] = (newdf.date == b).astype(int)
In [38]: newdf
Out[38]:
date new_col
id
3001 2000-01-01 0
3001 2000-01-02 0
3001 2000-01-03 0
3001 2000-01-04 0
3001 2000-01-05 1
3849 2001-01-05 0
3849 2001-01-06 0
3849 2001-01-07 0
3849 2001-01-08 1
8927 2006-01-06 0
8927 2006-01-07 0
8927 2006-01-08 0
8927 2006-01-09 1
不知何故,我不能只是做:
newdf['new_col'] = newdf.reset_index().groupby('id').date.transform(lambda ii: ii == ii.max())
....不知道爲什麼 。
您可以iterrows()遍歷行,並使用date_range()和concat方法。 –
我是熊貓新手,你能提供一些我可以遵循的代碼嗎? – user3816493