2015-10-10 66 views
0

我期望通過app_name版本獲得計數的差異。我的數據集是這樣的:APP_NAME,VERSION_ID,計數,[差異]按列分組的R中的行之間的差異

這裏是集

data = structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L)), .Names = c("app_name", "version_id", 
"count"), class = "data.frame", row.names = c(NA, -9L)) 

鑑於這種data.frame,我怎麼能雙方APP_NAME & VERSION_ID得到計數的滯後差?每個應用程序的初始(第一)版本差異將爲零,因爲沒有區別。

這裏是最後的結果會是什麼樣子與最後的「差異」列

structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L), diff = c(0, 20, 0, 0, 0, 1.25, 
0, 0, 2.4)), .Names = c("app_name", "version_id", "count", "diff" 
), class = "data.frame", row.names = c(NA, -9L)) 
+0

到目前爲止您嘗試了什麼? –

+1

@Pascal我一直在嘗試使用mutate()無濟於事。以下線程:http://stackoverflow.com/questions/31362397/calculating-the-difference-between-rows-in-a-dataframe-using-dplyr –

回答

1

嘗試使用dplyrlag一個例子:

library(dplyr) 
data %>% group_by(app_name) %>% 
     mutate(diffvers = version_id - dplyr::lag(version_id, default = version_id[1]), 
       diffcount = count - dplyr::lag(count, default = count[1])) 

Source: local data frame [9 x 5] 
Groups: app_name [3] 

    app_name version_id count diffvers diffcount 
    (fctr)  (dbl) (int) (dbl)  (int) 
1  a  1.0 600  0.0   0 
2  a  1.1 620  0.1  20 
3  a  2.3 620  1.2   0 
4  b  2.0 200  0.0   0 
5  b  3.1 200  1.1   0 
6  b  3.3 250  0.2  50 
7  b  4.0 250  0.7   0 
8  c  1.1 15  0.0   0 
9  c  2.4 36  1.3  21 
0

我們可以使用data.table。我們將'data.frame'轉換爲'data.table'(setDT(data)),按'app_name'分組,循環(lapply(...SDcols中指定的列,獲取當前元素與其lag之間的差異(默認爲shifttype='lag')並指定(:=)輸出以創建新列。

library(data.table)#v1.9.6 
setDT(data)[, c('diffvers', 'diffcount') := lapply(.SD, 
       function(x) x-shift(x, fill=x[1L])), by = app_name, .SDcols=2:3] 

data 
# app_name version_id count diffvers diffcount 
#1:  a  1.0 600  0.0   0 
#2:  a  1.1 620  0.1  20 
#3:  a  2.3 620  1.2   0 
#4:  b  2.0 200  0.0   0 
#5:  b  3.1 200  1.1   0 
#6:  b  3.3 250  0.2  50 
#7:  b  4.0 250  0.7   0 
#8:  c  1.1 15  0.0   0 
#9:  c  2.4 36  1.3  21