2014-05-10 10 views
0

沿着我想在客戶的唯一ID級別,每個觀測被調換againt它下面 下面給出挽起我的數據的快照數據捲起與轉

basedata <- structure(list(customer = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "d"), class = "factor"), obs = c(12L, 
11L, 12L, 10L, 3L, 5L, 7L, 8L, 1L)), .Names = c("customer", "obs" 
), class = "data.frame", row.names = c(NA, -9L)) 

或者

customer obs 
a   12 
a   11 
a   12 
a   10 
b   3 
b   5 
b   7 
d   8 
d   1 

我想把它轉換成以下形式

customer obs1 obs2 obs3 obs4 
a 12 11 12 10 
b 3 5 7 - 
d 8 1 - - 

我用下面的代碼

basedata$shopping <- unlist(tapply(rawdata$customer, rawdata$customer, 
         function (x) seq(1, len = length(x)))) 
reshape(basedata, idvar = "customer", direction = "wide") 

它提供了以下錯誤

Error in `[.data.frame`(data, , timevar) : undefined columns selected 

我怎麼能做到這一點的R和出類拔萃? 謝謝

+0

是 「basedata」 一樣的 「RAWDATA」?你的問題是否是一個錯字? – A5C1D2H2I1M1N2O1R2T1

回答

2
x <- structure(list(customer = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "d"), class = "factor"), obs = c(12L, 
11L, 12L, 10L, 3L, 5L, 7L, 8L, 1L)), .Names = c("customer", "obs" 
), class = "data.frame", row.names = c(NA, -9L)) 

我選擇使用一些額外的包(plyrreshape2),因爲我覺得他們更容易和更普遍從base包比reshape使用。

library(plyr) 
library(reshape2) 
## add observation number 
x2 <- ddply(x,"customer",transform,num=1:length(customer)) 
## reshape 
dcast(x2,customer~num,value.var="obs") 
1

一個基礎R的方式,假設dat是數據,

> s <- split(dat$obs, dat$customer) 
> df <- data.frame(do.call(rbind, lapply(s, function(x){ length(x) <- 4; x }))) 
> names(df) <- paste0('obs', seq(df)) 
> df 
# obs1 obs2 obs3 obs4 
# a 12 11 12 10 
# b 3 5 7 NA 
# d 8 1 NA NA 

如果你想唯一客戶ID是一列,

> df2 <- cbind(customer = rownames(df), df) 
> rownames(df2) <- seq(nrow(df2)) 
> df2 
# customer obs1 obs2 obs3 obs4 
# 1  a 12 11 12 10 
# 2  b 3 5 7 NA 
# 3  d 8 1 NA NA 
0

假設 「基礎數據」和「原始數據」應該是相同的(或者至少是彼此的副本)。如果是這樣的話,你只是缺少指定reshapetimevar參數應該是什麼。

從上次中斷的地方繼續:

rawdata$shopping <- unlist(tapply(rawdata$customer, rawdata$customer, 
            function (x) seq(1, len = length(x)))) 
## rawdata$shopping <- with(rawdata, ave(customer, customer, FUN = seq_along)) 

下面是實際的整形步:

reshape(rawdata, idvar = "customer", timevar="shopping", direction = "wide") 
# customer obs.1 obs.2 obs.3 obs.4 
# 1  a 12 11 12 10 
# 5  b  3  5  7 NA 
# 8  d  8  1 NA NA