我有一種不平衡,寬數據幀,看起來是這樣的:麻煩不平衡DF從廣角到長
set.seed(1)
df <- data.frame(id1=seq(1:10),
id2=runif(10),
v1.a=runif(10),
v1.b=runif(10),
v1.c=runif(10),
v2.a=runif(10),
v2.b=runif(10),
v2.c=runif(10),
v3.a=runif(10),
#v3.b=runif(10),
v3.c=runif(10),
v4.a=runif(10),
v4.b=runif(10),
v4.c=runif(10),
#v5.a=runif(10),
#v5.b=runif(10),
v5.c=runif(10),
v6.a=runif(10),
v6.b=runif(10),
v6.c=runif(10),
v7.a=rep(NA, 10),
v7.b=rep(NA, 10),
v7.c=rep(NA, 10),
v8.d=runif(10))
而且我試圖把它變成一個長格式。 reshape
失敗,因爲並非所有變化列都出現在每次,所以我轉向Reshape
splitstackshape
。
library(splitstackshape)
vary <- grep("\\.a$|\\.b$|\\.c$|\\.d$", names(df))
stubs <- unique(sub("\\..*$", "", names(df[vary])))
df2 <- Reshape(df,
id.vars=c("id1", "id2"),
var.stubs=stubs,
sep=".")
但是,最終結果似乎不太正確。例如,v3
缺少投入爲「B」,我會認爲是時候2.在df2
,也有1次和2個長v3
值,但不能3.
id1 id2 time v1 v2 v3
1 1 0.26550866 1 0.20597457 0.82094629 0.3390729
2 2 0.37212390 1 0.17655675 0.64706019 0.8394404
3 3 0.57285336 1 0.68702285 0.78293276 0.3466835
4 4 0.90820779 1 0.38410372 0.55303631 0.3337749
5 5 0.20168193 1 0.76984142 0.52971958 0.4763512
6 6 0.89838968 1 0.49769924 0.78935623 0.8921983
7 7 0.94467527 1 0.71761851 0.02333120 0.8643395
8 8 0.66079779 1 0.99190609 0.47723007 0.3899895
9 9 0.62911404 1 0.38003518 0.73231374 0.7773207
10 10 0.06178627 1 0.77744522 0.69273156 0.9606180
11 1 0.26550866 2 0.93470523 0.47761962 0.4346595
12 2 0.37212390 2 0.21214252 0.86120948 0.7125147
13 3 0.57285336 2 0.65167377 0.43809711 0.3999944
14 4 0.90820779 2 0.12555510 0.24479728 0.3253522
15 5 0.20168193 2 0.26722067 0.07067905 0.7570871
16 6 0.89838968 2 0.38611409 0.09946616 0.2026923
17 7 0.94467527 2 0.01339033 0.31627171 0.7111212
18 8 0.66079779 2 0.38238796 0.51863426 0.1216919
19 9 0.62911404 2 0.86969085 0.66200508 0.2454885
20 10 0.06178627 2 0.34034900 0.40683019 0.1433044
21 1 0.26550866 3 0.48208012 0.91287592 NA
22 2 0.37212390 3 0.59956583 0.29360337 NA
23 3 0.57285336 3 0.49354131 0.45906573 NA
24 4 0.90820779 3 0.18621760 0.33239467 NA
25 5 0.20168193 3 0.82737332 0.65087047 NA
26 6 0.89838968 3 0.66846674 0.25801678 NA
27 7 0.94467527 3 0.79423986 0.47854525 NA
28 8 0.66079779 3 0.10794363 0.76631067 NA
29 9 0.62911404 3 0.72371095 0.08424691 NA
30 10 0.06178627 3 0.41127443 0.87532133 NA
我是不是製作錯誤?
是否有更好的選擇使用melt
或gather
?我嘗試了一些方法,但我沒有太多的運氣。我的實際使用案例包括我稱之爲vary
列的1302,3個時間段(a,b,c)和821個獨特的stubs
(非常明顯不平衡)。
您的預期產出是多少? – akrun
當你沒有變量的所有組合時相關 - http://stackoverflow.com/questions/34713846/reshape-messy-longitudinal-survey-data-containing-multiple-different-variables/34714134 – thelatemail