可變長度核心名稱識別

我有以下行命名方案的數據集：可變長度核心名稱識別

a.X.V 
where: 
a is a fixed-length core ID 
X is a variable-length string that subsets a, which means I should keep X 
V is a variable-length ID which specifies the individual elements of a.X to be averaged 
. is one of {-,_}

我所試圖做的是採取一切a.X's列平均值。樣本：

sampleList <- list("a.12.1"=c(1,2,3,4,5), "b.1.23"=c(3,4,1,4,5), "a.12.21"=c(5,7,2,8,9), "b.1.555"=c(6,8,9,0,6)) 
sampleList 
$a.12.1 
[1] 1 2 3 4 5 

$b.1.23 
[1] 3 4 1 4 5 

$a.12.21 
[1] 5 7 2 8 9 

$b.1.555 
[1] 6 8 9 0 6

目前，我手動gsubbing出.Vs得到的一般列表：

sampleList <- t(as.data.frame(sampleList)) 
y <- rowNames(sampleList) 
y <- gsub("(\\w\\.\\d+)\\.d+", "\\1", y)

有一個更快的方法來做到這一點？

這是我在工作流程中遇到的兩個問題的一半。另一半回答了here。

來源

2012-10-19 learner

你所說的 '手動gsubbing' 是什麼意思？你的意思是多次調用'gsub'？ –

您可以使用模式矢量來查找要分組的列的位置。我列入了一個我知道不會匹配任何內容的模式，以表明該解決方案對於這種情況非常有用。

# A *named* vector of patterns you want to group by 
patterns <- c(a.12="^a.12",b.12="^b.12",c.12="^c.12") 
# Find the locations of those patterns in your list 
inds <- lapply(patterns, grep, x=names(sampleList)) 
# Calculate the mean of each list element that matches the pattern 
out <- lapply(inds, function(i) 
    if(l <- length(i)) Reduce("+",sampleList[i])/l else NULL) 
# Set the names of the output 
names(out) <- names(patterns)

來源

2012-10-19 14:43:58

也許你可以考慮你的數據結構搞亂，使之更容易申請一些標準的工具：

sampleList <- list("a.12.1"=c(1,2,3,4,5), 
    "b.1.23"=c(3,4,1,4,5), "a.12.21"=c(5,7,2,8,9), 
    "b.1.555"=c(6,8,9,0,6)) 
library(reshape2) 
m1 <- melt(do.call(cbind,sampleList)) 
m2 <- cbind(m1,colsplit(m1$Var2,"\\.",c("coreID","val1","val2")))

結果是這樣的：

head(m2) 
    Var1 Var2 value coreID val1 val2 
1  1 a.12.1  1  a 12 1 
2  2 a.12.1  2  a 12 1 
3  3 a.12.1  3  a 12 1

然後你就可以更輕鬆做這樣的事情：

aggregate(value~val1,mean,data=subset(m2,coreID=="a"))

來源

2012-10-19 14:47:12

R準備做這個東西，如果你只是搬到data.frame s而不是list s。將你的'a'，'X'和'V'放到他們自己的專欄中。然後你可以使用ave，by，aggregate，subset等

data.frame(do.call(rbind, sampleList), 
      do.call(rbind, strsplit(names(sampleList), '\\.'))) 

#   X1 X2 X3 X4 X5 X1.1 X2.1 X3.1 
# a.12.1 1 2 3 4 5 a 12 1 
# b.1.23 3 4 1 4 5 b 1 23 
# a.12.21 5 7 2 8 9 a 12 21 
# b.1.555 6 8 9 0 6 b 1 555

來源

2012-10-19 14:52:36

可變長度核心名稱識別

回答

相關問題