2014-02-07 40 views
0

我正在運行中的R的碼,其中樣品的如下是用小的數據集 -流汗先前值的累加和除所述第一值

library(plyr) 
Ex<-structure(list(X1 = c(-36.8598, -37.1726, -36.4343, -36.8644, 
         -37.0599, -34.8818, -31.9907, -37.8304, 
         -34.3367, -31.2984, -33.5731), 
       X2 = c(64.26, 63.085, 66.36, 61.08, 61.57, 65.04, 72.69, 63.83, 
         67.555, 76.06, 68.61), 
       Y1 = c(493.81544, 493.81544, 494.54173, 
         494.61364, 494.61381, 494.38717, 494.64122, 493.73265,    494.04246, 
         494.92989, 494.98384), 
       Y2 = c(489.704166, 489.704166, 490.710962, 
         490.653212, 490.710612, 489.822928, 
         488.160904, 489.747776, 490.600579, 
         488.946738, 490.398958), 
       Y3 = c(19L, 19L, 19L, 23L, 30L,43L,43L,2L, 58L, 47L, 61L), 
       date = c("2013-06-01","2013-06-02","2013-06-03","2013-06-04", 
         "2013-06-05","2013-06-06","2013-06-07","2013-06-08", 
         "2013-06-09","2013-06-10","2013-06-11")), 
      .Names = c("X1", "X2", "Y1", "Y2", "Y3", "date"), 
      row.names = c(NA, 11L), class = "data.frame") 

Ex <- arrange(Ex, Y3) 

Ex$Dup <- as.numeric(duplicated(Y3)) 
Ex$Dup_rev <- as.numeric(duplicated(Y3,fromLast=TRUE)) 

##Testing If Else 
attach(Ex) 
Ex$X5 <- 0 
for(i in 1:length(Y3)) 
{ 
    if (Ex$Dup[i]==0 & Ex$Dup_rev[i]==0) 
    { 
    Ex$X5[i]=Y2[i] 
    } else if(Ex$Dup[i]==0) 
    { 
    Ex$X5[i]=Y2[i] 
    }else 
    {Ex$X5[i]=Y2[i] + X5[i-1]} 
} 

這樣做是除非列的值Y3是第一次出現在數據集中,對於Y3的每一行,我們需要創建一個列X5,它是前一個Y2的累加和。 由於我的數據非常龐大(大約110k行數據),這段代碼花了很多時間來執行。有沒有更簡單的方法來執行相同的代碼?

X1 X2 Y1 Y2 Y3 date Dup Dup_rev X5 
1 -37.8304 63.830 493.7326 489.7478 2 2013-06-08 0 0 489.7478 
2 -36.8598 64.260 493.8154 489.7042 19 2013-06-01 0 1 489.7042 
3 -37.1726 63.085 493.8154 489.7042 19 2013-06-02 1 1 1469.1125 
4 -36.4343 66.360 494.5417 490.7110 19 2013-06-03 1 0 1470.1193 
5 -36.8644 61.080 494.6136 490.6532 23 2013-06-04 0 0 490.6532 
+0

你可以發佈你想要的輸出嗎?我從運行你的代碼得到的輸出與你正在尋找的內容的描述不匹配 –

+0

我的錯誤,例如a = c(1,2,3,4,5),我想創建b這樣的那b [i] = a [i] + b [i-1]。其中b [1] = 0. – RHelp

+0

您可以使用示例中的變量名嗎?所以在剛剛發佈的輸出中,第三行中'X5'的值是'1469.1125'。從你的解釋中可以看出它應該等於'Y2 [3] + X5 [2]'這是'489.7042 + 489.7042 = 979.4084'。對不起,如果我錯過了一些非常明顯的東西,但我無法弄清楚'1469.1125'來自何處 –

回答

1

這是dplyr的解決方案。 dplyr是plyr的下一個迭代,速度非常快。

library(dplyr) 
Ex %.% group_by(Y3) %.% mutate(X5 = cumsum(Y2)) 
#> Source: local data frame [11 x 7] 
#> Groups: Y3 
#> 
#>   X1  X2  Y1  Y2 Y3  date  X5 
#> 1 -36.8598 64.260 493.8154 489.7042 19 2013-06-01 489.7042 
#> 2 -37.1726 63.085 493.8154 489.7042 19 2013-06-02 979.4083 
#> 3 -36.4343 66.360 494.5417 490.7110 19 2013-06-03 1470.1193 
#> 4 -36.8644 61.080 494.6136 490.6532 23 2013-06-04 490.6532 
#> 5 -37.0599 61.570 494.6138 490.7106 30 2013-06-05 490.7106 
#> 6 -34.8818 65.040 494.3872 489.8229 43 2013-06-06 489.8229 
#> 7 -31.9907 72.690 494.6412 488.1609 43 2013-06-07 977.9838 
#> 8 -37.8304 63.830 493.7326 489.7478 2 2013-06-08 489.7478 
#> 9 -34.3367 67.555 494.0425 490.6006 58 2013-06-09 490.6006 
#> 10 -31.2984 76.060 494.9299 488.9467 47 2013-06-10 488.9467 
#> 11 -33.5731 68.610 494.9838 490.3990 61 2013-06-11 490.3990 
2

下面是使用data.table的解決方案,這是這種類型的分析是您說的「因子」分裂速度非常快(在這種情況下,Y3):

library(data.table) 
DT <- data.table(Ex)[, X5:=cumsum(Y2), by=Y3] 
DT 
#   X1  X2  Y1  Y2 Y3  date  X5 
# 1: -37.8304 63.830 493.7326 489.7478 2 2013-06-08 489.7478 
# 2: -36.8598 64.260 493.8154 489.7042 19 2013-06-01 489.7042 
# 3: -37.1726 63.085 493.8154 489.7042 19 2013-06-02 979.4083 
# 4: -36.4343 66.360 494.5417 490.7110 19 2013-06-03 1470.1193 
# 5: -36.8644 61.080 494.6136 490.6532 23 2013-06-04 490.6532 
# 6: -37.0599 61.570 494.6138 490.7106 30 2013-06-05 490.7106 
# 7: -34.8818 65.040 494.3872 489.8229 43 2013-06-06 489.8229 
# 8: -31.9907 72.690 494.6412 488.1609 43 2013-06-07 977.9838 
# 9: -31.2984 76.060 494.9299 488.9467 47 2013-06-10 488.9467 
# 10: -34.3367 67.555 494.0425 490.6006 58 2013-06-09 490.6006 
# 11: -33.5731 68.610 494.9838 490.3990 61 2013-06-11 490.3990  

不過請注意,像傑克,我不明白你是如何得到1469的第三排,而不是979.4083。另外,我剛剛運行了代碼並得到了與我一樣的答案,所以我猜測您的示例結果中存在拼寫錯誤,或者數據更改了?

+0

非常感謝!這正是我想要的:) – RHelp

+0

@RHelp,只要這回答你的問題,請考慮將其標記爲已回答。 – BrodieG