2017-06-12 46 views
0

時間我有這樣的數據集:插入一個空行的每一個關鍵字遇到

structure(list(Event = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("Insert", 
"Ok"), class = "factor")), .Names = "Event", class = "data.frame", row.names = c(NA, 
-18L)) 

enter image description here

我想插入每次下方的空行有一個「插入」:

enter image description here

如何在R中執行此操作?

回答

1

我們可以創建一個邏輯矢量

i1 <- df1$Event == "Insert" 
Event <- unlist(lapply(split(df1$Event, 
    cumsum(c(TRUE, i1[-length(i1)]))), function(x) c(as.character(x), ""))) 
df2 <- data.frame(Event, stringsAsFactors=FALSE) 

或者另一種選擇是

library(data.table) 
setDT(df1)[, grp := cumsum(shift(Event == "Insert", fill = TRUE)) 
     ][, .SD[c(seq_len(.N), .N+1)] , grp 
     ][is.na(Event), Event := "" 
     ][, grp := NULL][] 
#  Event 
# 1:  Ok 
# 2:  Ok 
# 3: Insert 
# 4:  
# 5:  Ok 
# 6:  Ok 
# 7:  Ok 
# 8:  Ok 
# 9: Insert 
#10:  
#11: Insert 
#12:  
#13:  Ok 
#14:  Ok 
#15:  Ok 
#16:  Ok 
#17: Insert 
#18:  
#19:  Ok 
#20:  Ok 
#21:  Ok 
#22: Insert 
#23:  
5

這裏是關於索引的第二方法通過基地整數。在這裏,我將使用一個字符向量,因爲在提供數據的情況下這更有意義。

# get integer index with repeats for observations with "Insert" 
myRows <- sort(c(seq_along(temp), which(temp == "Insert"))) 
# set second row index to missing 
is.na(myRows) <- duplicated(myRows) 

現在,喂這個索引字符向量。

temp[myRows] 
[1] "Ok"  "Ok"  "Insert" NA  "Ok"  "Ok"  "Ok"  "Ok"  "Insert" NA  "Insert" NA  "Ok"  
[14] "Ok"  "Ok"  "Ok"  "Insert" NA  "Ok"  "Ok"  "Ok"  "Insert" NA 

數據

temp <- 
structure(list(Event = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("Insert", 
"Ok"), class = "factor")), .Names = "Event", class = "data.frame", row.names = c(NA, 
-18L)) 

temp <- as.character(temp$Event) 
0

這裏有一個想法:

mydf <- structure(list(Event = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("Insert", 
"Ok"), class = "factor")), .Names = "Event", class = "data.frame", row.names = c(NA, 
-18L)) 

mydf$rowindex <- 1:nrow(mydf) 
mydf$repeats <- 1 
mydf$repeats[which(mydf$Event=="Insert")] <- 2 
mydf2 <- mydf[rep(mydf$rowindex,mydf$repeats),] 
mydf2[which(grepl("\\.",row.names(mydf2))),"Event"] <- NA 

讓我知道,如果這有助於。

相關問題