2017-08-10 51 views
2

問題描述

startendframe_type柱(下面)被車指車道改變的開始和結束。對於驅動程序的每個id,我希望將所有行從開始到結束標記爲新列中的「LC」。如何標記r列中兩個特定字符值之間的所有行?

數據

foo <- data.frame(id = c(rep(1, 20), rep(2, 10)), 
        frame_type = rep(c(".", ".", ".", 
            "start", ".", "lcf", ".", 
            ".", "end", "."), 3)) 
> foo 
    id frame_type 
1 1   . 
2 1   . 
3 1   . 
4 1  start 
5 1   . 
6 1  lcf 
7 1   . 
8 1   . 
9 1  end 
10 1   . 
11 1   . 
12 1   . 
13 1   . 
14 1  start 
15 1   . 
16 1  lcf 
17 1   . 
18 1   . 
19 1  end 
20 1   . 
21 2   . 
22 2   . 
23 2   . 
24 2  start 
25 2   . 
26 2  lcf 
27 2   . 
28 2   . 
29 2  end 
30 2   . 

所需的輸出:

> foo 
    id frame_type LC 
1 1   . . 
2 1   . . 
3 1   . . 
4 1  start LC1 
5 1   . LC1 
6 1  lcf LC1 
7 1   . LC1 
8 1   . LC1 
9 1  end LC1 
10 1   . . 
11 1   . . 
12 1   . . 
13 1   . . 
14 1  start LC2 
15 1   . LC2 
16 1  lcf LC2 
17 1   . LC2 
18 1   . LC2 
19 1  end LC2 
20 1   . . 
21 2   . . 
22 2   . . 
23 2   . . 
24 2  start LC1 
25 2   . LC1 
26 2  lcf LC1 
27 2   . LC1 
28 2   . LC1 
29 2  end LC1 
30 2   . . 

我尋覓了很多,但不能得到任何的想法來解決這個問題。我知道的最接近的是tidyr::fill(),但在這種情況下不起作用。我想使用dplyr::group_by(),因爲有幾個id s。請幫忙。

回答

1
do.call(rbind, lapply(split(foo, foo$id), function(a){ 
    temp = inverse.rle(with(rle(cumsum(a$frame_type == "start") - 
            cumsum(head(c(FALSE, a$frame_type == "end"), -1))), 
          list(lengths = lengths, 
           values = replace(values, values == 1, 
                seq_along(values[values == 1]))))) 
    a$LC = replace(paste0("LC", temp), temp == 0, ".") 
    a 
})) 
#  id frame_type LC 
#1.1 1   . . 
#1.2 1   . . 
#1.3 1   . . 
#1.4 1  start LC1 
#1.5 1   . LC1 
#1.6 1  lcf LC1 
#1.7 1   . LC1 
#1.8 1   . LC1 
#1.9 1  end LC1 
#1.10 1   . . 
#1.11 1   . . 
#1.12 1   . . 
#1.13 1   . . 
#1.14 1  start LC2 
#1.15 1   . LC2 
#1.16 1  lcf LC2 
#1.17 1   . LC2 
#1.18 1   . LC2 
#1.19 1  end LC2 
#1.20 1   . . 
#2.21 2   . . 
#2.22 2   . . 
#2.23 2   . . 
#2.24 2  start LC1 
#2.25 2   . LC1 
#2.26 2  lcf LC1 
#2.27 2   . LC1 
#2.28 2   . LC1 
#2.29 2  end LC1 
#2.30 2   . . 
+0

謝謝! 'rle()'是非常有用的。 –

3

我們可以使用data.table。將'data.frame'轉換爲'data.table'(setDT(foo)),按照邏輯向量(frame_type == "start"),ifany'frame_type'的累積和進行分組,得到'start'字符串,然後獲得行索引(.I)從'開始'到'結束'的位置順序,提取該列($V1),使用i創建一個新的列'LC'paste字符串"LC",邏輯索引的累計和按'id' 。 (不推薦)

library(data.table) 
i1 <- setDT(foo)[ , if(any(frame_type == "start")) .I[which(frame_type == 
     "start"):which(frame_type == "end")], cumsum(frame_type == "start")]$V1 
foo[i1, LC := paste0("LC", cumsum(frame_type == "start")), id 
    ][is.na(LC), LC := "."][] 
# id frame_type LC 
# 1: 1   . . 
# 2: 1   . . 
# 3: 1   . . 
# 4: 1  start LC1 
# 5: 1   . LC1 
# 6: 1  lcf LC1 
# 7: 1   . LC1 
# 8: 1   . LC1 
# 9: 1  end LC1 
#10: 1   . . 
#11: 1   . . 
#12: 1   . . 
#13: 1   . . 
#14: 1  start LC2 
#15: 1   . LC2 
#16: 1  lcf LC2 
#17: 1   . LC2 
#18: 1   . LC2 
#19: 1  end LC2 
#20: 1   . . 
#21: 2   . . 
#22: 2   . . 
#23: 2   . . 
#24: 2  start LC1 
#25: 2   . LC1 
#26: 2  lcf LC1 
#27: 2   . LC1 
#28: 2   . LC1 
#29: 2  end LC1 
#30: 2   . . 
+0

感謝您的回答。如果需要,NA值可以更改爲.。它適用於這些示例數據,但對於我的原始數據,它會拋出錯誤,其中(frame_type ==「start」):出於某種原因,其中(frame_type ==「end」): 參數的長度爲0。 –

+0

@umairdurrani我已經處理了使用'cumsum'沒有'分組變量'開始'的情況。你也有案件,只有'結束' – akrun

相關問題