2017-02-11 23 views
2

我遇到了一個奇怪的問題,將數據幀列堆積爲3列。出於某種原因,因子列在堆疊時會失去其價值。R - 字符列在堆疊列時丟失值

當我使用下面的代碼時,理論上,治療值應該堆疊在一起,而不是被一個值替換。

library(reshape2) 
test1<-reshape(df, direction="long", varying=split(names(df), rep(seq_len(ncol(df)/4), 3))) 

我不會糊全部結果,但這種頻率表就足夠了:

+4

具有非唯一的colnames的數據框(或任何名稱對象,如列表)是嚴重錯誤的('df [[「Treatment」]]指的是什麼?)。你應該避免在第一個地方建造一個。 –

回答

2

重複列名導致此問題爲您服務。更好的方法是分割它們並對列名進行更正,然後使用rbind將它們綁定在一起。 我試圖通過創建兩個新列以保持所有信息存儲的信息q3_...

do.call('rbind', lapply(seq(3, 12, by = 3), function(x) { y <- df1[,(x-2):x ]; 
                  y <- do.call("cbind", list(mo = colnames(y)[1], yr = colnames(y)[2], y)); 
                  colnames(y)[3:4] <- c('mo_val', 'yr_val'); 
                  y })) 

#   mo  yr mo_val yr_val  Treatment 
# 1: q3_1mo q3_1yr  NA  NA anti-androgen 
# 2: q3_1mo q3_1yr  5 2012 anti-androgen 
# 3: q3_1mo q3_1yr  4 2008 anti-androgen 
# 4: q3_1mo q3_1yr  4 2010 anti-androgen 
# 5: q3_1mo q3_1yr  NA  NA anti-androgen 
# 6: q3_1mo q3_1yr  2 2008 anti-androgen 
# 7: q3_2mo q3_2yr  8 2010  docetaxel 
# 8: q3_2mo q3_2yr  5 2012  docetaxel 
# 9: q3_2mo q3_2yr  4 2008  docetaxel 
# 10: q3_2mo q3_2yr  4 2010  docetaxel 
# 11: q3_2mo q3_2yr  8 2011  docetaxel 
# 12: q3_2mo q3_2yr  2 2008  docetaxel 
# 13: q3_3mo q3_3yr  NA  NA abiraterone 
# 14: q3_3mo q3_3yr  5 2012 abiraterone 
# 15: q3_3mo q3_3yr  4 2008 abiraterone 
# 16: q3_3mo q3_3yr  4 2010 abiraterone 
# 17: q3_3mo q3_3yr  8 2011 abiraterone 
# 18: q3_3mo q3_3yr  2 2008 abiraterone 
# 19: q3_3mo q3_3yr  NA  NA   other 
# 20: q3_3mo q3_3yr  5 2012   other 
# 21: q3_3mo q3_3yr  4 2008   other 
# 22: q3_3mo q3_3yr  4 2010   other 
# 23: q3_3mo q3_3yr  8 2011   other 
# 24: q3_3mo q3_3yr  2 2008   other 
#   mo  yr mo_val yr_val  Treatment 

數據:

df1 <- structure(list(q3_1mo = c(NA, 5L, 4L, 4L, NA, 2L), 
         q3_1yr = c(NA, 2012L, 2008L, 2010L, NA, 2008L), 
         Treatment = c("anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen"), 
         q3_2mo = c(8L, 5L, 4L, 4L, 8L, 2L), 
         q3_2yr = c(2010L, 2012L, 2008L, 2010L, 2011L, 2008L), 
         Treatment = c("docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel"), 
         q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L), 
         q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L), 
         Treatment = c("abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone"), 
         q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L), 
         q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L), 
         Treatment = c("other", "other", "other", "other", "other", "other")), 
       .Names = c("q3_1mo", "q3_1yr", "Treatment", "q3_2mo", "q3_2yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment"), 
       row.names = c(NA, -6L), class = "data.frame") 
1

您也可以解決這個問題,並通過提供獨特的名稱使用相同的代碼到你的變量make.unique

names(df) <- make.unique(names(df)) 
test1 <- reshape(df, direction="long", 
       varying=split(names(df), rep(seq_len(ncol(df)/4), 3))) 

這將返回

test1的

time q3_1mo q3_1yr  Treatment id 
1.1 1  NA  NA anti-androgen 1 
2.1 1  5 2012 anti-androgen 2 
3.1 1  4 2008 anti-androgen 3 
4.1 1  4 2010 anti-androgen 4 
5.1 1  NA  NA anti-androgen 5 
6.1 1  2 2008 anti-androgen 6 
1.2 2  8 2010  docetaxel 1 
2.2 2  5 2012  docetaxel 2 
3.2 2  4 2008  docetaxel 3 
4.2 2  4 2010  docetaxel 4 
5.2 2  8 2011  docetaxel 5 
6.2 2  2 2008  docetaxel 6 
1.3 3  NA  NA abiraterone 1 
2.3 3  5 2012 abiraterone 2 
3.3 3  4 2008 abiraterone 3 
4.3 3  4 2010 abiraterone 4 
5.3 3  8 2011 abiraterone 5 
6.3 3  2 2008 abiraterone 6 
1.4 4  NA  NA   other 1 
2.4 4  5 2012   other 2 
3.4 4  4 2008   other 3 
4.4 4  4 2010   other 4 
5.4 4  8 2011   other 5 
6.4 4  2 2008   other 6 

你就得花幾行清理的名字,也許刪除一些列,但您的代碼將經歷。另外請注意,reshape是一個基本的R函數,因此加載reshape2是不必要的。