使用reshape2和dplyr包
加載庫和數據:
library(reshape2)
library(dplyr)
x <- structure(
list(
nomem_encr = c(800009L, 800009L, 800009L, 800012L, 800015L, 800015L),
timeline.compressed = c(79, 79, 95, 79, 28, 28),
sel01 = c(NA, 6L, NA, NA, NA, 7L),
sel02 = c(NA, 6L, NA, NA, NA, 7L),
sel03 = c(NA, 3L, NA, NA, NA, 5L),
sel04 = c(NA, 6L, NA, NA, NA, 6L),
close_num = c(1, NA, 0.2, 1, 0.8, NA),
gener_sat = c(7L, NA, 7L, 8L, 7L, NA)
),
.Names = c(
"nomem_encr", "timeline.compressed",
"sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat"
),
class = "data.frame",
row.names = c(NA, 6L)
)
x
這是你的數據是什麼樣子:
nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1 800009 79 NA NA NA NA 1.0 7
2 800009 79 6 6 3 6 NA NA
3 800009 95 NA NA NA NA 0.2 7
4 800012 79 NA NA NA NA 1.0 8
5 800015 28 NA NA NA NA 0.8 7
6 800015 28 7 7 5 6 NA NA
現在,我們將數據融入長型:
melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>%
head(15)
輸出:
nomem_encr timeline.compressed variable value
1 800009 79 sel01 NA
2 800009 79 sel01 6
3 800009 95 sel01 NA
4 800012 79 sel01 NA
5 800015 28 sel01 NA
6 800015 28 sel01 7
7 800009 79 sel02 NA
8 800009 79 sel02 6
9 800009 95 sel02 NA
10 800012 79 sel02 NA
11 800015 28 sel02 NA
12 800015 28 sel02 7
13 800009 79 sel03 NA
14 800009 79 sel03 3
15 800009 95 sel03 NA
如果我們投了熔化的數據框,默認行爲是計算我們對每件物品有多少條目:
melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>%
dcast(
formula = nomem_encr + timeline.compressed ~ variable
)
輸出:
Aggregation function missing: defaulting to length
nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1 800009 79 2 2 2 2 2 2
2 800009 95 1 1 1 1 1 1
3 800012 79 1 1 1 1 1 1
4 800015 28 2 2 2 2 2 2
我們有2項用於通過800009 79
(使用nomem_encr
和timeline.compressed
作爲識別變數)所標識的項目。
我們可以改變默認的行爲別的東西像sum
:
melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>%
dcast(
formula = nomem_encr + timeline.compressed ~ variable,
fun.aggregate = function(xs) sum(xs, na.rm = TRUE)
)
輸出:
nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1 800009 79 6 6 3 6 1.0 7
2 800009 95 0 0 0 0 0.2 7
3 800012 79 0 0 0 0 1.0 8
4 800015 28 7 7 5 6 0.8 7
你也可以提供樣本數據。使用'head'創建子集和'dput'向我們展示如何複製 – Olivia
回覆您的第一條評論:我恐怕完全不瞭解您的意見。我猜想對於每一行,X變量都被回答或Y變量。然而,有時兩行具有相同的時間變量,即,X和Y變量同時被回答。我想要的是將這些行組合成一行,其中X和Y變量都被回答。 – Elisabeth
我們如何知道你必須修剪哪些行? – jaySf