您可以使用:
df1 = pd.DataFrame({'Datetime':pd.date_range('2015-01-04','2015-01-08'),
'IDXType':[20,20,33,33,33]})
print (df1)
Datetime IDXType
0 2015-01-04 20
1 2015-01-05 20
2 2015-01-06 33
3 2015-01-07 33
4 2015-01-08 33
df2 = pd.DataFrame({'Datetime':pd.date_range('2015-01-04','2015-01-08'),
'IDXType':[30,30,21,21,10]})
print (df2)
Datetime IDXType
0 2015-01-04 30
1 2015-01-05 30
2 2015-01-06 21
3 2015-01-07 21
4 2015-01-08 10
df3 = pd.DataFrame({'Datetime':pd.date_range('2015-01-04','2015-01-08'),
'IDXType':[20,20,30,31,31]})
print (df3)
Datetime IDXType
0 2015-01-04 20
1 2015-01-05 20
2 2015-01-06 30
3 2015-01-07 31
4 2015-01-08 31
index_list = [df1,df2,df3]
value_list = [20,22,28,29,30,31,32,33]
myarray = []
def minimum(dataframe,value):
return dataframe.loc[dataframe["IDXType"] == value, 'Datetime'].min()
for i in index_list:
for value_i in value_list:
myarray.append(minimum(i,value_i))
#print (myarray)
result = {
'df1':pd.Series(myarray[0:8], index=value_list),
'df2':pd.Series(myarray[8:16], index=value_list),
'df3':pd.Series(myarray[16:24], index=value_list)
}
result = pd.DataFrame(result)
print (result)
df1 df2 df3
20 2015-01-04 NaT 2015-01-04
22 NaT NaT NaT
28 NaT NaT NaT
29 NaT NaT NaT
30 NaT 2015-01-04 2015-01-06
31 NaT NaT 2015-01-07
32 NaT NaT NaT
33 2015-01-06 NaT NaT
我的解決方案與groupby
和聚合min
,concat
,reindex
和最後刪除index name
由rename_axis
(新中pandas
0.18.0
):
print (df1.groupby('IDXType')['Datetime'].min())
IDXType
20 2015-01-04
33 2015-01-06
Name: Datetime, dtype: datetime64[ns]
df = pd.concat([df1.groupby('IDXType')['Datetime'].min(),
df2.groupby('IDXType')['Datetime'].min(),
df3.groupby('IDXType')['Datetime'].min()],
axis=1,
keys=('df1','df2','df3')).reindex(value_list).rename_axis(None)
print (df)
df1 df2 df3
20 2015-01-04 NaT 2015-01-04
22 NaT NaT NaT
28 NaT NaT NaT
29 NaT NaT NaT
30 NaT 2015-01-04 2015-01-06
31 NaT NaT 2015-01-07
32 NaT NaT NaT
33 2015-01-06 NaT NaT
您還可以使用更動態的解決方案 - 在concat
使用list comprehension
,但需要在新df5
添加新的名單列名:
index_list = [df1,df2,df3]
value_list = [20,22,28,29,30,31,32,33]
namesdf = ['df1','df2','df3']
df5 = pd.concat([x.groupby('IDXType')['Datetime'].min() for x in index_list],
axis=1,
keys=namesdf).reindex(value_list).rename_axis(None)
print (df5)
df1 df2 df3
20 2015-01-04 NaT 2015-01-04
22 NaT NaT NaT
28 NaT NaT NaT
29 NaT NaT NaT
30 NaT 2015-01-04 2015-01-06
31 NaT NaT 2015-01-07
32 NaT NaT NaT
33 2015-01-06 NaT NaT
'index_list'是'與列DataFrames'的''list'日期時間'和'IDXType'? – jezrael
index是包含列的DataFrame的列表。 Datetime和IDXType是我必須在原始源數據框中檢查的兩列。 –