2016-07-15 34 views
2

我有一個數據框(df),其中列A是在時間戳給定的時間點給藥的藥物單位。我想填補缺失的數值(NaN)與給定藥物半衰期(180分鐘)的藥物濃度。我在熊貓的代碼中掙扎。真的很感謝幫助和見解。在此先感謝當半衰期已知時,如何填補藥物/藥物不規則時間序列的缺失值

df 
         A  
Timestamp              
1991-04-21 09:09:00 9.0   
1991-04-21 3:00:00 NaN  
1991-04-21 9:00:00 NaN  
1991-04-22 07:35:00 10.0  
1991-04-22 13:40:00 NaN   
1991-04-22 16:56:00 NaN  

鑑於該藥的半衰期爲180分鐘。我想fillna(值)作爲經過時間的函數和藥物

Timestamp    A  

1991-04-21 09:00:00 9.0 
1991-04-21 3:00:00 ~2.25 
1991-04-21 9:00:00 ~0.55 
1991-04-22 07:35:00 10.0 
1991-04-22 13:40:00 ~2.5 
1991-04-22 16:56:00 ~0.75 

回答

2

您的時間戳進行排序的半衰期,我假定這是一個錯字。我將它固定在下面。

import pandas as pd 
import numpy as np 
from StringIO import StringIO 

text = """TimeStamp     A  
1991-04-21 09:09:00 9.0   
1991-04-21 13:00:00 NaN  
1991-04-21 19:00:00 NaN  
1991-04-22 07:35:00 10.0  
1991-04-22 13:40:00 NaN   
1991-04-22 16:56:00 NaN """ 

df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[0]) 

這是神奇的代碼。

# half-life of 180 minutes is 10,800 seconds 
# we need to calculate lamda (intentionally mis-spelled) 
lamda = 10800/np.log(2) 

# returns time difference for each element 
# relative to first element 
def time_diff(x): 
    return x - x.iloc[0] 

# create partition of non-nulls with subsequent nulls 
partition = df.A.notnull().cumsum() 

# calculate time differences in seconds for each 
# element relative to most recent non-null observation 
# use .dt accessor and method .total_seconds() 
tdiffs = df.TimeStamp.groupby(partition).apply(time_diff).dt.total_seconds() 

# apply exponential decay 
decay = np.exp(-tdiffs/lamda) 

# finally, forward fill the observations and multiply by decay 
decay * df.A.ffill() 

0  9.000000 
1  3.697606 
2  0.924402 
3 10.000000 
4  2.452325 
5  1.152895 
dtype: float64 
+0

非常感謝您。那太完美了! – Pearl

+0

@珍珠很高興我可以幫忙。 – piRSquared