2014-02-27 89 views
1

我有3列的數據幀的現有列的​​元素重塑的數據幀到長格式:通過擴大

A <- c("stringA", "stringA", "stringB", "stringB") 
B <- c(1, 2, 1, 2) 
C <- c("abcd", "abcd", "abcde", "bbc") 

df <- data.frame(A, B, C) 

> test 
     A B  C 
1 stringA 1 abcd 
2 stringA 2 abcd 
3 stringB 1 abcde 
4 stringB 2 bbc 

我想重新格式化,使得列B成爲行名稱和在列中的值C被拆分成單個字母得到:

A 1 2 
stringA a a 
stringA b b 
stringA c c 
stringA d d 
stringB a b 
stringB b b 
stringB c c 
stringB d NA 
stringB e NA 

回答

3

下面是使用「data.table」和「reshape2」的方法。首先確保你至少使用了「data.table」包的1.8.11版本。

library(reshape2) 
library(data.table) 
packageVersion("data.table") 
# [1] ‘1.8.11’ 

DT <- data.table(df, key="A,B") 
DT <- DT[, list(C = unlist(strsplit(as.character(C), ""))), by = key(DT)] 
DT[, N := sequence(.N), by = key(DT)] 
dcast.data.table(DT, A + N ~ B, value.var="C") 
#   A N 1 2 
# 1: stringA 1 a a 
# 2: stringA 2 b b 
# 3: stringA 3 c c 
# 4: stringA 4 d d 
# 5: stringB 1 a b 
# 6: stringB 2 b b 
# 7: stringB 3 c c 
# 8: stringB 4 d NA 
# 9: stringB 5 e NA 

如果你喜歡與基礎R堅持,這種方法有點類似:

## Split the "C" column up 
X <- strsplit(as.character(df$C), "") 

## "Expand" your data.frame 
df2 <- df[rep(seq_along(X), sapply(X, length)), ] 

## Create an additional "id" 
df2$id <- with(df2, ave(as.character(A), A, B, FUN = seq_along)) 

## Replace your "C" values 
df2$C <- unlist(X) 

## Reshape your data 
reshape(df2, direction = "wide", idvar=c("A", "id"), timevar="B") 
#   A id C.1 C.2 
# 1 stringA 1 a a 
# 1.1 stringA 2 b b 
# 1.2 stringA 3 c c 
# 1.3 stringA 4 d d 
# 3 stringB 1 a b 
# 3.1 stringB 2 b b 
# 3.2 stringB 3 c c 
# 3.3 stringB 4 d <NA> 
# 3.4 stringB 5 e <NA>