2016-08-04 97 views
3

我有這樣通過基於R中的鍵減去值來創建新列?

ID  DAYS FREQUENCY 
"ads" 20  3 
"jwa" 45  2 
"mno" 4  1 
"ads" 13  3 
"jwa" 60  2 
"ads" 18  3 

數據表我想補充一點,減去根據id的日子一列,減去最接近在一起的日子。 我的新表想是這樣的:

ID  DAYS FREQUENCY DAYS DIFF 
"ads" 20  3   2 (because 20-18) 
"jwa" 45  2   NA (because no value greater than 45 for that id) 
"mno" 4  1   NA 
"ads" 13  3   NA 
"jwa" 60  2   15 
"ads" 18  3   5 

獎勵:有沒有使用合併功能的方法嗎?

+0

爲什麼你想/希望在這裏使用合併? Fwiw,如果你願意安裝一個軟件包,可以使用'library(data.table); setDT(DF)[order(DAYS),dd:= DAYS - shift(DAYS),by = ID]' – Frank

回答

1

下面是一個使用dplyr答案:

require(dplyr) 
mydata %>% 
    mutate(row.order = row_number()) %>% # row numbers added to preserve original row order 
    group_by(ID) %>% 
    arrange(DAYS) %>% 
    mutate(lag = lag(DAYS)) %>% 
    mutate(days.diff = DAYS - lag) %>% 
    ungroup() %>% 
    arrange(row.order) %>% 
    select(ID, DAYS, FREQUENCY, days.diff) 

輸出:

 ID DAYS FREQUENCY days.diff 
    <fctr> <int>  <int>  <int> 
1 ads 20   3   2 
2 jwa 45   2  NA 
3 mno  4   1  NA 
4 ads 13   3  NA 
5 jwa 60   2  15 
6 ads 18   3   5 
+0

您不需要連續進行兩個mutate調用。 mutate(x = g(z),y = f(x))'是可行的。 – Frank

+1

謝謝@Frank,學到了新的東西! –

0

你可以做到這一點使用dplyr和快速循環:

library(dplyr) 

# Rowwise data.frame creation because I'm too lazy not to copy-paste the example data 
df <- tibble::frame_data(
    ~ID, ~DAYS, ~FREQUENCY, 
    "ads", 20,  3, 
    "jwa", 45,  2, 
    "mno", 4,  1, 
    "ads", 13,  3, 
    "jwa", 60,  2, 
    "ads", 18,  3 
) 

# Subtract each number in a numeric vector with the one following it 
rolling_subtraction <- function(x) { 
    out <- vector('numeric', length(x)) 
    for (i in seq_along(out)) { 
    out[[i]] <- x[i] - x[i + 1] # x[i + 1] is NA if the index is out of bounds 
    } 

    out 
} 

# Arrange data.frame in order of ID/Days and apply rolling subtraction 
df %>% 
    arrange(ID, desc(DAYS)) %>% 
    group_by(ID) %>% 
    mutate(days_diff = rolling_subtraction(DAYS))