2015-05-26 38 views
1

之間我有一個變量,它看起來像這樣:拆分變量和插入NA在

Var 
[1] 3, 4, 5  2, 4, 5  2, 4  1, 4, 5 

我需要把它拆分成數據幀,看起來像這樣:

V1 V2 V3 V4 V5 
NA NA 3 4 5 
NA 2 NA 4 5 
NA 2 NA 4 NA 
1 NA NA 4 5 

抱歉,系統我想不出找到解決我的問題的帖子。有誰知道我該怎麼做? 非常感謝您提前!

編輯:我找到了一個解決方案根據您的答案,並張貼在下面。

編輯2:我使用Ananda的解決方案提高了我的代碼效率。

+3

是'Var'一個'list'或'VECTOR'還是什麼?你的例子是不可重現的。它是'c(3,4,5,2,4,5,2,4,1,4,5)'還是'list(c(3,4,5),c(2,4,5), c(2,4),c(1,4,5))或c(「3,4,5,2,4,5,2,4 1,4,5」)'? – thelatemail

回答

1

由OP的回答來看, 「VAR」 是一個字符串,如:

var <- c("3, 4, 5", "2, 4, 5", "2, 4", "1, 4, 5") 

如果是這樣的話,你可以考慮我的 「splitstackshape」 包cSplit_e

library(splitstackshape) 
cSplit_e(data.frame(var), "var", ",", mode = "value", drop = TRUE) 
# var_1 var_2 var_3 var_4 var_5 
# 1 NA NA  3  4  5 
# 2 NA  2 NA  4  5 
# 3 NA  2 NA  4 NA 
# 4  1 NA NA  4  5 

如果它是list,正如其他答案所假設的那樣,您可以使用支持cSplit_e的「splitstackshape」中的(未導出)numMat函數。

var <- list(c(3,4,5), c(2,4,5), c(2,4), c(1,4,5)) 
splitstackshape:::numMat(var, mode = "value") 
#  1 2 3 4 5 
# [1,] NA NA 3 4 5 
# [2,] NA 2 NA 4 5 
# [3,] NA 2 NA 4 NA 
# [4,] 1 NA NA 4 5 

引擎蓋下,numMat是一個非常類似的方法,在@ thelatemail的回答中。


如果你有-99代表NA和要排除他們,你可以嘗試:

var <- c("3, 4, 5", "2, -99, 4, 5", "2, 4", "1, 4, 5, -99") 
splitstackshape:::numMat(
    lapply(strsplit(var, ","), function(x) as.numeric(x)[as.numeric(x) > 0]), 
    mode = "value") 
#  1 2 3 4 5 
# [1,] NA NA 3 4 5 
# [2,] NA 2 NA 4 5 
# [3,] NA 2 NA 4 NA 
# [4,] 1 NA NA 4 5 
+0

非常感謝!你的第一個解決方案工作得很好,並使我的代碼更短! – JSP

0

如果我們假設您var是這似乎工作清單:

var <- list(c(3,4,5),c(2,4,5),c(2,4),c(1,4,5)) 

#define function find_num to essentially create 
#5 new functions (called closures) inside the for-loop below 
find_num <- function(x) { 
    num <- function(mylist) { 
    sapply(mylist, function(i) if(x %in% i) return(x) else return(NA)) 
    } 
} 

#initiate list 
new_list <- list() 
#find_num is initiated with 5 different values essentially (in each iteration) 
#creating 5 new functions (closures) each for the number we want 
for (i in 1:5){ 
    myfunc <- find_num(i) 
    #this creates the list we want. Each element is a column 
    new_list[[length(new_list)+1]] <- myfunc(var) 
} 

#combine the columns into a new matrix 
new_list <- do.call(cbind, new_list) 

輸出:

> new_list 
    [,1] [,2] [,3] [,4] [,5] 
[1,] NA NA 3 4 5 
[2,] NA 2 NA 4 5 
[3,] NA 2 NA 4 NA 
[4,] 1 NA NA 4 5 
5

使用矩陣索引:

Var <- list(c(3,4,5),c(2,4,5),c(2,4),c(1,4,5)) 
unVar <- unlist(Var) 
out <- matrix(NA, nrow=length(Var), ncol=max(unVar)) 

out[cbind(rep(seq_along(Var),sapply(Var,length)),unVar)] <- unVar 
# and if you're using the new version of R, you can simplify a little: 
out[cbind(rep(seq_along(Var),lengths(Var)),unVar)] <- unVar 

#  [,1] [,2] [,3] [,4] [,5] 
#[1,] NA NA 3 4 5 
#[2,] NA 2 NA 4 5 
#[3,] NA 2 NA 4 NA 
#[4,] 1 NA NA 4 5 
0

如果無功是隻是一個矢量然後我會做以下幾點:

Var = c(3,4,5,2,4,5,2,4,1,4,5) 
RowIdx = c(rep(1,3),rep(2,3),rep(3,2),rep(4,3)) 
DF = matrix(NA,nrow=4,ncol=5) 

for (idx in 1:length(Var)){ 
    DF[RowIdx[idx],Var[idx]] = Var[idx] 
} 

當然,如果你有,你可能想找到一種方法來生成更自動化的方式行索引更多數據

1
Var <- list(c(3, 4, 5), c(2, 4, 5), c(2, 4), c(1, 4, 5)) 
M <- matrix(NA, nrow=length(Var), ncol=max(sapply(Var,max))) 
for(L in seq(Var)) { M [ cbind(rep(L, length(Var[[L]])), Var[[L]]) ] <- Var[[L]]} 
M 
    [,1] [,2] [,3] [,4] [,5] 
[1,] NA NA 3 4 5 
[2,] NA 2 NA 4 5 
[3,] NA 2 NA 4 NA 
[4,] 1 NA NA 4 5 

個人我的投票建議是thelatemail的版本,這是基本同構對此。

0

我設法根據您的回答找到解決方案!我的最終解決方案如下所示:

# I had the additional problem that my variable was a factor, therefore I had to transform it first. 
df <- data.frame(Var) 
Var <- lapply(strsplit(as.character(df$Var), ", "), "[") 
for(i in 1:length(Var)){ 
    Var[[i]] <- as.numeric(Var[[i]]) 
} 

# Then I created a matrix based on thelatemails and BondedDusts approach. 
M <- matrix(NA, nrow=length(Var), ncol=max(sapply(Var,max))) 

# Additionally, I had the problem that there were some lines with a single -99, which indicates a missing value for the complete line. I had some problems with this negative value. For this reason, I assigned NA's first. 
for(i in 1:length(Var)){ 
    Var[[i]][Var[[i]] == -99] <- NA 
} 

# Final assignment like suggested by BonedDust. 
for(L in seq(Var)) { M [ cbind(rep(L, length(Var[[L]])), Var[[L]]) ] <- Var[[L]]} 
M 

我不確定這是否是最快的解決方案,但現在一切正常!非常感謝您的快速和廣泛的答案!