2017-01-21 49 views
-1

這是數據幀1:比較日期來決定輸出

Date Serial Number Type 
0 2014-12-17 1N4AL2EP8DC270200 New 
1 2015-10-28 1N4AL2EP8DC270200 Used 
2 2015-01-22 1N4AL3AP1EN239307 New 
3 2015-11-22 1N4AL3AP1EN239307 Used 
4 2015-05-22 1N4AL3AP1FC235402 New 
5 2016-12-02 1N4AL3AP1FC235402 Used 
6 2015-01-22 1N4AL3AP2FC213098 New 
7 2016-05-13 1N4AL3AP2FC213098 Used 
8 2014-05-14 1N4AL3AP3EC132416 New 
9 2016-04-07 1N4AL3AP3EC132416 Used 
10 2014-05-24 1N4AL3AP5EC316644 New 
11 2014-12-18 1N4AL3AP5EC316644 Used 
12 2014-12-11 1N4AL3AP6EC322517 New 
13 2015-10-04 1N4AL3AP6EC322517 Used 
14 2016-06-06 1N4AL3AP6EC322517 Used 
... 

這是數據幀2:

Date Serial Number 
0 2014-03-12 5N1AA08C78N611573 
1 2014-03-12 JN8AS5MT3EW604277 
2 2014-03-12 1N6AF0LX5DN114710 
3 2014-03-12 1N4AL3AP8DN447876 
4 2014-03-12 JN8AZ1MU8AW021145 
5 2014-03-12 JN1AZ4EH0AM500138 
6 2014-03-12 JN8AF5MR3BT013548 
7 2014-03-12 3N1AB61E17L629049 
8 2014-03-12 3N1BC13E87L368844 
9 2014-03-13 1N6AD07W95C431183 
10 2014-03-13 1N6AA07A25N543180 
11 2014-03-13 1N4CL2AP1BC110185 
12 2014-03-13 JN8AZ1MW1BW181306 
13 2014-03-13 5N1BV28U46N116791 
... 

剛剛給出的數據幀的樣本,而不是整個數據幀。我需要檢索每個序列號的第一個日期,其類型與DataFrame 1中使用的類型相同(例如:對於序列號'1N4AL3AP6EC322517'2015-10-04是我正在查找的日期,然後將此日期與如果DataFrame 2中的日期早於DataFrame 1中的相同序列號,則記錄日期爲'A',否則標記爲'B'。 ,什麼是做同樣的有效途徑

回答

0

我認爲你可以使用merge_asof

print (df2) 
     Date  Serial Number 
0 2016-03-12 1N4AL3AP6EC322517 
1 2013-03-12 1N4AL3AP5EC316644 
2 2014-03-12 1N4AL3AP3EC132416 
3 2016-08-12 1N4AL3AP2FC213098 
4 2014-03-12 JN8AZ1MU8AW021145 

#if necessary cast Date columns to datetime 
df1.Date = pd.to_datetime(df1.Date) 
df2.Date = pd.to_datetime(df2.Date) 
#get first value of column Serial Number filtered by Used 
df = df1[df1.Type == 'Used'].drop_duplicates(['Serial Number']) 
print (df) 
     Date  Serial Number Type 
1 2015-10-28 1N4AL2EP8DC270200 Used 
3 2015-11-22 1N4AL3AP1EN239307 Used 
5 2016-12-02 1N4AL3AP1FC235402 Used 
7 2016-05-13 1N4AL3AP2FC213098 Used 
9 2016-04-07 1N4AL3AP3EC132416 Used 
11 2014-12-18 1N4AL3AP5EC316644 Used 
13 2015-10-04 1N4AL3AP6EC322517 Used 

#add value B 
df2['Mark'] = 'B' 
df = pd.merge_asof(df.sort_values(['Date']), 
        df2.sort_values(['Date']), on='Date', by='Serial Number') 
print (df) 
     Date  Serial Number Type Mark 
0 2014-12-18 1N4AL3AP5EC316644 Used B 
1 2015-10-04 1N4AL3AP6EC322517 Used NaN 
2 2015-10-28 1N4AL2EP8DC270200 Used NaN 
3 2015-11-22 1N4AL3AP1EN239307 Used NaN 
4 2016-04-07 1N4AL3AP3EC132416 Used B 
5 2016-05-13 1N4AL3AP2FC213098 Used NaN 
6 2016-12-02 1N4AL3AP1FC235402 Used NaN 
#add value A 
mask = df['Serial Number'].isin(df2['Serial Number']) 
df.loc[mask, 'Mark'] = df.loc[mask, 'Mark'].fillna('A') 
print (df) 
     Date  Serial Number Type Mark 
0 2014-12-18 1N4AL3AP5EC316644 Used B 
1 2015-10-04 1N4AL3AP6EC322517 Used A 
2 2015-10-28 1N4AL2EP8DC270200 Used NaN 
3 2015-11-22 1N4AL3AP1EN239307 Used NaN 
4 2016-04-07 1N4AL3AP3EC132416 Used B 
5 2016-05-13 1N4AL3AP2FC213098 Used A 
6 2016-12-02 1N4AL3AP1FC235402 Used NaN