2009-10-02 47 views
0

我已經將數據按列讀入數據幀R中。一些列將增加價值;對於那些列而言,我想用它與該列中先前值的差異來替換每個值(n)。例如,看個人專欄中,我想有選擇地將R中的列替換爲delta值

c(1,2,5,7,8) 

c(1,3,2,1) 

這是連續元素

之間的差異所取代。然而,它變得非常晚了一天,我認爲我的大腦剛剛停止工作。這是我目前

col1 <- c(1,2,3,4,NA,2,3,1) # This column rises and falls, so we want to ignore it 
col2 <- c(1,2,3,5,NA,5,6,7) # Note: this column always rises in value, so we want to replace it with deltas 
col3 <- c(5,4,6,7,NA,9,3,5) # This column rises and falls, so we want to ignore it 
d <- cbind(col1, col2, col3) 
d 
fix_data <- function(data) { 
    # Iterate through each column... 
    for (column in data[,1:dim(data)[2]]) { 
     lastvalue <- 0 
     # Now walk through each value in the column, 
     # checking to see if the column consistently rises in value 
     for (value in column) { 
      if (is.na(value) == FALSE) { # Need to ignore NAs 
       if (value >= lastvalue) { 
        alwaysIncrementing <- TRUE 
       } else { 
        alwaysIncrementing <- FALSE 
        break 
       } 
      } 
     } 

     if (alwaysIncrementing) { 
      print(paste("Column", column, "always increments")) 
     } 

     # If a column is always incrementing, alwaysIncrementing will now be TRUE 
     # In this case, I want to replace each element in the column with the delta between successive 
     # elements. The size of the column shrinks by 1 in doing this, so just prepend a copy of 
     # the 1st element to the start of the list to ensure the column length remains the same 
     if (alwaysIncrementing) { 
      print(paste("This is an incrementing column:", colnames(column))) 
      column <- c(column[1], diff(column, lag=1)) 
     } 
    } 
    data 
} 

fix_data(d) 
d 

如果你複製/粘貼此代碼到RGUI代碼,你會看到它沒有做任何事情來提供的數據幀。

除了失去理智,我做錯了什麼?

預先感謝

+0

您不會在任何地方分配最後一個值... – Shane 2009-10-02 10:44:06

回答

3

沒有尋址的任何細節的代碼,你賦值到column,這是環內的局部變量(即有在這方面columndata之間沒有關係)。您需要將這些值分配到data中的適當值。

此外,data將在您的功能本地,因此您需要在運行該功能後指定回data

順便說一句,你可以使用diff,看是否有值遞增,而不是遍歷每個值:

idx <- apply(d, 2, function(x) !any(diff(x[!is.na(x)]) < 0)) 
d[,idx] <- blah 
2

diff計算矢量連續值之間的差異。您可以使用例如數據框將其應用於數據框中的每一列。

dfr <- data.frame(x = c(1,2,5,7,8), y = (1:5)^2) 
as.data.frame(lapply(dfr, diff)) 

    x y 
1 1 3 
2 3 5 
3 2 7 
4 1 9 

編輯:我只注意到一些事情。您正在使用矩陣,而不是數據框架(正如您在問題中所述)。對於矩陣'd',您可以使用

d_diff <- apply(d, 2, diff) 
#Find columns that are (strictly) increasing 
incr <- apply(d_diff, 2, function(x) all(x > 0, na.rm=TRUE)) 
#Replace values in the approriate columns 
d[2:nrow(d),incr] <- d_diff[,incr]