2017-03-02 63 views
0

我正在尋找解決方案來計算拖欠桶。我已經想出了重設cumsum的部分,但我堅持如何基於觸發器「延遲」cumsum;看到我的,我想做到哪裏我期望的結果是correct_bucket什麼例子:Cumsum重置和延遲

df <- data.frame(id = c(1,1,1,1,2,2,3,3,3,3,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,7,7,7,7,7,8,8,8,8), 
      min_due = c(25,50,50,75,25,50,25,50,25,25,25,50,75,100,25,50,75,100,100,25,50,25,14.99,0,25,60,60,0,25,50,75,100,75,25,50,25,50), 
      payment = c(0,0,25,0,0,0,0,0,50,25,0,0,0,0,0,0,0,0,25,100,0,150,25,14.99,0,25,60,60,0,0,0,0,50,0,0,25,0), 
      past_due_amt = c(0,25,25,50,0,25,0,25,0,0,0,25,50,75,0,25,50,75,75,0,25,0,0,0,0,0,0,0,0,25,50,75,50,0,25,0,25), 
      correct_bucket = c(0,1,1,2,0,1,0,1,0,0,0,1,2,3,0,1,2,3,3,0,1,0,0,0,0,0,0,0,0,1,2,3,2,0,1,0,1)) 

correct_bucket的說明:這表明,通過ID,該min_due被滿足(或不能)由支付地爲大於或等於先前(滯後1)min_pay。例如:ID#1的min_due爲25(在第1行),付款爲0(第2行),因此correct_bucket = 1.正如您所見,在每個示例中,正確存儲桶的值需要迭代取決於付款是否已付款以及付款金額。

想法?請詢問您需要的任何澄清問題,我近在咫尺,歡迎任何額外的幫助!

謝謝!

+0

對不起,我聽不懂你的問題。刪除答案 – akrun

+0

無賴,很遺憾地浪費你的時間 –

+0

沒關係。可能有人能比我更好地理解你的問題 – akrun

回答

1
df$original_order = 1:nrow(df) #In case you need later. OPTIONAL 

#Obtain the incremental min_due for each id 
df$b2 = unlist(lapply(split(df, df$id), function(a) c(0, diff(a$min_due)))) 

#Function to get your values from incremental min_due 
ff = function(x){ 
x$b3 = 0 
    for (i in 2:NROW(x)){ 
     if (x$b2[i] > 0){ 
      x$b3[i] = x$b3[i-1] + 1 
     } 
     if (x$b2[i] == 0){ 
      x$b3[i] = x$b3[i-1] 
     } 
     if (x$b2[i] < 0){ 
      x$b3[i] = 0 
     } 
    } 
    return(x) 
} 

#Split df by id and use the above function on each sub group 
#'b3' is the value you want 
do.call(rbind, lapply(split(df, df$id), function(a) ff(a))) 

新FF

ff = function(x){ 
    x$b3 = 0 

    if(NROW(x) < 2){ 
     return(x) 
    } 

    for (i in 2:NROW(x)){ 
     if (x$b2[i] > 0){ 
      x$b3[i] = x$b3[i-1] + 1 
     } 
     if (x$b2[i] == 0){ 
      x$b3[i] = x$b3[i-1] 
     } 
     if (x$b2[i] < 0){ 
      x$b3[i] = 0 
     } 
    } 
    return(x) 
} 
+1

這就是我想到的! (我是一名培訓的SAS程序員),但我不知道如何在這裏完成。但是,當我在我的實際數據集上運行時出現此錯誤:if(x $ b2 [i]> 0)中的錯誤{:缺少TRUE/FALSE所需的值。我在b2中沒有任何NA,所以我有點困惑。有什麼想法嗎? –

+0

這是數字,我只是一個白癡?總結(df $ b2)不會產生缺失值,我桌子的其餘部分也不會有任何缺失;表(is.na(DF))。 –

+0

anyNA(df $ b2) [1] FALSE –