2017-09-13 72 views
0

如何通過分組從數據框中獲取修改和未修改的行。R-如何從數據框中檢查修改和未修改的值

數據幀。

U_ID process value1 value2 

1  Fetch  A  A 
2  Review  C  C 
1  Review  A  H 
1  Fetch  B  C 
2  Review  NA  F 
3  Fetch  A  D 
4  Fetch  R  J 
4  Review  H  J 

下面數據幀顯示前一行值的樣品通過分組U_ID,工藝塔。

U_ID process value1 value2 value1modified value2modified  

1  Fetch  A  A   0     0 
1  Fetch  B  C   1     1 
1  Review  A  H   0     0 
2  Review  C  C   0     0 
2  Review  NA  F   1     1 
3  Fetch  A  D   0     0 
4  Fetch  R  J   0     0 
4  Review  H  J   0     0 

我的預期數據幀。

U_ID process  value1modcount value1unmodcount value2modcount value2unmodcount 

1  Fetch  1     1      1     1 
1  Review  0     1      0     1 
2  Review  1     1      1     1 
3  Fetch  0     1      0     1 
4  Fetch  0     1      0     1 
4  Review  0     1      0     1 

DATA

structure(list(U_ID = c(1, 2, 1, 1, 2, 3, 4, 4), process = c("Fetch", 
"Review", "Review", "Fetch", "Review", "Fetch", "Fetch", "Review" 
), value1 = c("A", "C", "A", "B", NA, "A", "R", "H"), value2 = c("A", 
"C", "H", "C", "F", "D", "J", "j")), .Names = c("U_ID", "process", 
"value1", "value2"), row.names = c(NA, -8L), class = "data.frame") 
+0

如何將決定順序?有沒有任何id或時間戳列? –

+0

是的,我們需要爲U_ID申請順序, –

+0

請給予好評和接受的答案,如果它值得 –

回答

0

它可以通過dplyr完成。

library(dplyr) 

data <- structure(list(U_ID = c(1, 2, 1, 1, 2, 3, 4, 4), process = c("Fetch", 
"Review", "Review", "Fetch", "Review", "Fetch", "Fetch", "Review" 
), value1 = c("A", "C", "A", "B", NA, "A", "R", "H"), value2 = c("A", 
"C", "H", "C", "F", "D", "J", "j")), .Names = c("U_ID", "process", 
"value1", "value2"), row.names = c(NA, -8L), class = "data.frame") 

data %>% 
    group_by(U_ID, process) %>% 
    mutate(
    value1.next = lag(value1), 
    value2.next = lag(value2), 
    rn = row_number(), 
    value1modified = ifelse(rn == 1, 0, 
          ifelse((is.na(value1) + is.na(value1.next)) == 1, 1, 
            ifelse(value1 != value1.next, 1,0))), 
    value2modified = ifelse(rn == 1, 0, 
          ifelse((is.na(value2) + is.na(value2.next)) == 1, 1, 
            ifelse(value2 != value2.next, 1,0)))) %>% 
    group_by(U_ID, process) %>% 
    summarise(v1modcount = sum(ifelse(value1modified == 1, 1, 0)), 
      v1unmodcount = sum(ifelse(value1modified == 0, 1, 0)), 
      v2modcount = sum(ifelse(value2modified == 1, 1, 0)), 
      v2unmodcount = sum(ifelse(value2modified == 0, 1, 0))) 

輸出:

U_ID process v1modcount v1unmodcount v2modcount v2unmodcount 
1 Fetch 1 1 1 1 
1 Review 0 1 0 1 
2 Review 1 1 1 1 
3 Fetch 0 1 0 1 
4 Fetch 0 1 0 1 
4 Review 0 1 0 1 
+0

謝謝你,但我有一個疑問,如果我有更多的100列裝置,如值1,值2,值3 ... value100。如何動態地做同樣的事情。 –

+0

可以使用tidyr庫的傳播和聚集功能做到這一點。請給予好評或接受的答案,如果它值得。 –

+0

好的謝謝,我會檢查。 –