我遇到了一個奇怪的問題,將數據幀列堆積爲3列。出於某種原因,因子列在堆疊時會失去其價值。R - 字符列在堆疊列時丟失值
當我使用下面的代碼時,理論上,治療值應該堆疊在一起,而不是被一個值替換。
library(reshape2)
test1<-reshape(df, direction="long", varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))
我不會糊全部結果,但這種頻率表就足夠了:
我遇到了一個奇怪的問題,將數據幀列堆積爲3列。出於某種原因,因子列在堆疊時會失去其價值。R - 字符列在堆疊列時丟失值
當我使用下面的代碼時,理論上,治療值應該堆疊在一起,而不是被一個值替換。
library(reshape2)
test1<-reshape(df, direction="long", varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))
我不會糊全部結果,但這種頻率表就足夠了:
重複列名導致此問題爲您服務。更好的方法是分割它們並對列名進行更正,然後使用rbind
將它們綁定在一起。 我試圖通過創建兩個新列以保持所有信息存儲的信息q3_...
do.call('rbind', lapply(seq(3, 12, by = 3), function(x) { y <- df1[,(x-2):x ];
y <- do.call("cbind", list(mo = colnames(y)[1], yr = colnames(y)[2], y));
colnames(y)[3:4] <- c('mo_val', 'yr_val');
y }))
# mo yr mo_val yr_val Treatment
# 1: q3_1mo q3_1yr NA NA anti-androgen
# 2: q3_1mo q3_1yr 5 2012 anti-androgen
# 3: q3_1mo q3_1yr 4 2008 anti-androgen
# 4: q3_1mo q3_1yr 4 2010 anti-androgen
# 5: q3_1mo q3_1yr NA NA anti-androgen
# 6: q3_1mo q3_1yr 2 2008 anti-androgen
# 7: q3_2mo q3_2yr 8 2010 docetaxel
# 8: q3_2mo q3_2yr 5 2012 docetaxel
# 9: q3_2mo q3_2yr 4 2008 docetaxel
# 10: q3_2mo q3_2yr 4 2010 docetaxel
# 11: q3_2mo q3_2yr 8 2011 docetaxel
# 12: q3_2mo q3_2yr 2 2008 docetaxel
# 13: q3_3mo q3_3yr NA NA abiraterone
# 14: q3_3mo q3_3yr 5 2012 abiraterone
# 15: q3_3mo q3_3yr 4 2008 abiraterone
# 16: q3_3mo q3_3yr 4 2010 abiraterone
# 17: q3_3mo q3_3yr 8 2011 abiraterone
# 18: q3_3mo q3_3yr 2 2008 abiraterone
# 19: q3_3mo q3_3yr NA NA other
# 20: q3_3mo q3_3yr 5 2012 other
# 21: q3_3mo q3_3yr 4 2008 other
# 22: q3_3mo q3_3yr 4 2010 other
# 23: q3_3mo q3_3yr 8 2011 other
# 24: q3_3mo q3_3yr 2 2008 other
# mo yr mo_val yr_val Treatment
數據:
df1 <- structure(list(q3_1mo = c(NA, 5L, 4L, 4L, NA, 2L),
q3_1yr = c(NA, 2012L, 2008L, 2010L, NA, 2008L),
Treatment = c("anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen"),
q3_2mo = c(8L, 5L, 4L, 4L, 8L, 2L),
q3_2yr = c(2010L, 2012L, 2008L, 2010L, 2011L, 2008L),
Treatment = c("docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel"),
q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L),
q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L),
Treatment = c("abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone"),
q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L),
q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L),
Treatment = c("other", "other", "other", "other", "other", "other")),
.Names = c("q3_1mo", "q3_1yr", "Treatment", "q3_2mo", "q3_2yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment"),
row.names = c(NA, -6L), class = "data.frame")
您也可以解決這個問題,並通過提供獨特的名稱使用相同的代碼到你的變量make.unique
。
names(df) <- make.unique(names(df))
test1 <- reshape(df, direction="long",
varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))
這將返回
test1的
time q3_1mo q3_1yr Treatment id
1.1 1 NA NA anti-androgen 1
2.1 1 5 2012 anti-androgen 2
3.1 1 4 2008 anti-androgen 3
4.1 1 4 2010 anti-androgen 4
5.1 1 NA NA anti-androgen 5
6.1 1 2 2008 anti-androgen 6
1.2 2 8 2010 docetaxel 1
2.2 2 5 2012 docetaxel 2
3.2 2 4 2008 docetaxel 3
4.2 2 4 2010 docetaxel 4
5.2 2 8 2011 docetaxel 5
6.2 2 2 2008 docetaxel 6
1.3 3 NA NA abiraterone 1
2.3 3 5 2012 abiraterone 2
3.3 3 4 2008 abiraterone 3
4.3 3 4 2010 abiraterone 4
5.3 3 8 2011 abiraterone 5
6.3 3 2 2008 abiraterone 6
1.4 4 NA NA other 1
2.4 4 5 2012 other 2
3.4 4 4 2008 other 3
4.4 4 4 2010 other 4
5.4 4 8 2011 other 5
6.4 4 2 2008 other 6
你就得花幾行清理的名字,也許刪除一些列,但您的代碼將經歷。另外請注意,reshape
是一個基本的R函數,因此加載reshape2
是不必要的。
具有非唯一的colnames的數據框(或任何名稱對象,如列表)是嚴重錯誤的('df [[「Treatment」]]指的是什麼?)。你應該避免在第一個地方建造一個。 –