2017-09-22 61 views
3

我擁有的數據框包含兩個表格:ID和類型(字符)。見下:在R中創建一個包含行聚合的新數據

set.seed(123) 
ID <- seq(1,25) 
type <- sample(letters[1:26], 25, replace=TRUE) 

df <- data.frame(ID, type) 

我需要創建一個新的數據框,只包含一列。第一個觀察將是第一個 列中的三個字母,第二個觀察是第二個三個字母,並很快就會開始。

新的數據看起來像

ndf <- data.frame(ntype=c("huk", "wyb", "nxo", "lyl", "roc", "xgb", "iyx", "sqz", "r")) 

回答

3

我們創建一個分組變量與gl,然後用tapplypaste元素一起

n <- 3 
ndf <- data.frame(ntype = with(df, unname(tapply(type, as.integer(gl(nrow(df), n, 
     nrow(df))), FUN =paste, collapse=""))), stringsAsFactors= FALSE) 
ndf$ntype 
#[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r" 

或者另一種選擇是paste全列在一起,然後拆分

strsplit(paste(df$type, collapse=""), "(?<=.{3})", perl = TRUE)[[1]] 
#[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r" 

或者另一種選擇是substringpaste

substring(paste(df$type, collapse=""), seq(1, nrow(df), by = 3), 
     c(seq(3, nrow(df), by = 3), nrow(df))) 
#[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r" 

注:上述所有base R解決方案

+1

謝謝。有用! – user9292

+2

'(?<=。{3})'+1! – PoGibas

4

1)rollapply沿着輸入向量:

library(zoo) 

rollapply(df$type, 3, by = 3, paste, collapse = "", partial = TRUE, align = "left") 

捐贈:

[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r" 

2)這種替代使用aggregate沒有包。

n <- nrow(df) 
aggregate(type ~ gl(n, 3, n), df, paste, collapse = "")[2] 

,並提供:

type 
1 huk 
2 wyb 
3 nxo 
4 lyl 
5 roc 
6 xgb 
7 iyx 
8 sqz 
9 r 
0

通過使用dplyr

df$group=(df$ID-1)%/%3 
df%>%group_by(group)%>%dplyr::summarise(ntype=paste0(type,collapse = '')) 
# A tibble: 9 x 2 
    group ntype 
    <dbl> <chr> 
1  0 huk 
2  1 wyb 
3  2 nxo 
4  3 lyl 
5  4 roc 
6  5 xgb 
7  6 iyx 
8  7 sqz 
9  8  r 
相關問題