2013-03-27 22 views
0

20行我的工作:爲什麼我不能在data.frame的一系列列上使用聚合與cbind?數據

Zv9_NA110 6176 7276 5'to3'IntronExon 0 + 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA110 10126 11226 5'to3'IntronExon 0 + 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 9 9 15 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA110 11219 12319 5'to3'ExonIntron 0 + 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA110 14887 15987 5'to3'IntronExon 0 + 1100 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 
Zv9_NA110 18923 20023 5'to3'IntronExon 0 + 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA110 21069 22169 5'to3'ExonIntron 0 + 1100 0 135 115 65 54 45 36 27 16 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA113 1615 2715 5'to3'IntronExon 0 - 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA113 2335 3435 5'to3'ExonIntron 0 - 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA113 5398 6498 5'to3'IntronExon 0 - 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA113 7173 8273 5'to3'ExonIntron 0 - 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA118 11674 12774 5'to3'IntronExon 0 + 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA118 12711 13811 5'to3'ExonIntron 0 + 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA123 38151 39251 5'to3'ExonIntron 0 - 1100 0 1061 958 844 796 695 600 464 346 265 210 150 133 94 81 72 46 18 4 0 0 0 0 0 0 0 0 0 7 9 9 9 11 21 35 43 58 91 108 180 268 406 547 712 833 882 960 1094 1172 1245 1331 1432 1510 1604 1711 1810 1830 1837 1823 1781 1690 1638 1560 1489 1257 854 731 631 589 551 497 439 404 369 301 231 168 123 76 58 50 42 28 20 11 9 9 24 27 27 27 27 27 25 18 18 18 18 18 18 18 18 18 18 18 18 18 14 5 0 0 
Zv9_NA124 2578 3678 5'to3'ExonIntron 0 + 1100 0 423 407 401 377 357 345 324 304 249 185 111 54 30 12 0 0 0 0 0 0 0 0 0 0 0 0 0 1 9 9 9 9 14 18 25 27 27 27 27 27 27 27 27 27 27 27 26 18 18 18 18 18 18 16 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA129 4939 6039 5'to3'IntronExon 0 + 1100 226 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 9 9 9 9 9 9 9 9 9 9 9 9 14 34 45 60 97 128 175 293 395 524 621 764 894 1036 1164 1334 1469 1639 1801 1885 1983 
Zv9_NA132 12589 13689 5'to3'ExonIntron 0 - 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA132 13634 14734 5'to3'IntronExon 0 - 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA132 14481 15581 5'to3'ExonIntron 0 - 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 9 9 9 9 9 9 9 9 9 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA132 19534 20634 5'to3'IntronExon 0 - 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Zv9_NA132 28708 29808 5'to3'ExonIntron 0 - 1100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 9 15 18 24 27 42 46 73 112 142 157 162 162 162 162 162 162 162 162 159 153 153 153 153 153 150 144 132 112 76 52 30 25 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

我進入讀該如下:

> dat <- read.table("dat.dat",header=F) 

我需要獲得9〜118列的平均值,通過解析列4.

這工作:

> all_means <- aggregate(cbind(V9,V10,V11)~V4,data=dat,FUN=mean) 

       V4 V9 V10 V11 
1 5'to3'ExonIntron 0 0.00 0 
2 5'to3'IntronExon 0 0.75 1 

但是沒有辦法,我打字次已經到了V118。

我已經試過這樣:

> aggregate(cbind(9:118)~V4,data=blah,FUN=mean) 

但我得到這個錯誤:

Error in model.frame.default(formula = cbind(9:118) ~ V4, data = blah) : 
    variable lengths differ (found for 'V4') 

有什麼愚蠢的我失蹤?

回答

1

您可以使用

## S3 method for class 'data.frame' 
aggregate(x, by, FUN, ..., simplify = TRUE) 

與您的數據假設你的數據在數據幀DF

DF <- read.table(text = txt, header = FALSE, stringsAsFactors = FALSE) 
result <- aggregate(DF[, 9:118], by = list(DF[, 4]), FUN = mean) 

# Using pander to print result table nicely. It's not needed for aggregation :) 
require(pander) 
pandoc.table(result) 
## 
## ---------------------------------------------------- 
##  Group.1  V9 V10 V11 V12 V13 V14 
## ---------------- ----- ----- ----- ----- ----- ----- 
## 5'to3'ExonIntron 161.9 148 131 122.7 109.7 98.1 
## 
## 5'to3'IntronExon 0.0 0  0 0.0 0.0 0.0 
## ---------------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V15 V16 V17 V18 V19 V20 V21 V22 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 81.5 66.6 52.3 39.5 26.1 18.7 12.4 9.3 
## 
## 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V23 V24 V25 V26 V27 V28 V29 V30 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 7.2 4.6 1.8 0.4 0  0  0 0.5 
## 
## 0.0 0.0 0.0 0.0 0  0  0 0.0 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V31 V32 V33 V34 V35 V36 V37 V38 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 0.9 1.5 1.8 2.4 2.7 5 6.4 9.1 
## 
## 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V39 V40 V41 V42 V43 V44 V45 V46 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 13 16.2 19.2 21.5 23 24.7 28 29.7 
## 
## 0 0.0 0.0 0.0 0 0.0 0 0.0 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V47 V48 V49 V50 V51 V52 V53 V54 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 36.9 45.7 59.5 73.3 89.2 101.3 106.2 114 
## 
## 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V55 V56 V57 V58 V59 V60 V61 V62 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 127.3 134 140.7 148.1 156.2 160.4 167.4 175.7 
## 
## 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V63 V64 V65 V66 V67 V68 V69 V70 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 183.9 183.1 183.7 182.3 178.1 169 163.8 156.7 
## 
## 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V71 V72 V73 V74 V75 V76 V77 V78 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 149.8 126.6 86.3 74.0 64.0 59.8 56.0 50.6 
## 
## 0.7 0.9 0.9 1.5 1.8 1.8 1.8 1.8 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V79 V80 V81 V82 V83 V84 V85 V86 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 45.6 42.2 38.7 31.9 24.9 18.6 14.1 9.4 
## 
## 1.8 1.8 1.8 1.8 1.8 1.8 2.2 2.7 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ----------------------------------------------- 
## V87 V88 V89 V90 V91 V92 V93 V94 
## ----- ----- ----- ----- ----- ----- ----- ----- 
## 7.6 6.2 5.1 3.7 2.9 2.5 2.7 2.7 
## 
## 2.7 2.7 2.7 2.7 2.7 2.7 2.2 0.9 
## ----------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## -------------------------------------------------- 
## V95 V96 V97 V98 V99 V100 V101 V102 
## ----- ----- ----- ----- ----- ------ ------ ------ 
## 4.2 4.5 4.5 4.5 4.5 4.5 4.3 2.5 
## 
## 0.9 0.9 0.9 1.4 4.1 5.4 6.9 10.6 
## -------------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ------------------------------------------------------- 
## V103 V104 V105 V106 V107 V108 V109 V110 
## ------ ------ ------ ------ ------ ------ ------ ------ 
## 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 
## 
## 13.7 18.4 30.2 40.4 53.3 63.0 77.3 90.3 
## ------------------------------------------------------- 
## 
## Table: Table continues below 
## 
## 
## ------------------------------------------------------- 
## V111 V112 V113 V114 V115 V116 V117 V118 
## ------ ------ ------ ------ ------ ------ ------ ------ 
## 1.8 1.8 1.8 1.8 1.4 0.5 0.0 0.0 
## 
## 104.5 117.3 134.3 147.8 164.8 181.0 189.4 199.2 
## ------------------------------------------------------- 
## 
+0

謝謝 - 很難挑選出「勝利者」。但是這個答案是最清楚的(即,與我試圖解決問題的方式最相似)。 – 2013-03-27 11:26:51

2

您有多種選擇。

使用.創建一個公式,並使用paste

nn <- paste0('V', 9:118) 

傳遞數據的一個子集

aggregate(. ~ V4, data = dat[,c(4,9:118)], FUN = mean) 

您還可以創建列名的向量和列名是指

aggregate(. ~ V4, data = dat[,c('V4',nn)], FUN = mean) 

使用沒有多大意義這裏,給出公式的方法工作,但例如。

aggregate(do.call(cbind,lapply(nn, as.name)) ~ V4, data = dat, FUN = mean) 

但這是凌亂的,因爲它沒有很好地命名列。 (並且是難以遵循)

3

如果速度是一般的問題(這個操作不是必要的),你要使用的data.table包,如下做到這一點:

更安全的解決方案

由於MNEL的評論,我會使用:

library(data.table) 
dat <- as.data.table(dat) 
dat[,lapply(.SD,mean),by="V4",.SDcols=paste0("V", 9:118)] 

老辦法

dat[,lapply(.SD,mean),by="V4",.SDcols=9:118] 
+1

它仍然是安全的指'。SDcols''''paste0('V',9:118)' – mnel 2013-03-27 03:14:53

+0

@mnel謝謝你的提示,我完全同意這一點,但並不知道你也可以將列名傳遞給'.SDcols' ...當然值得記憶。我相應地更新。 – 2013-03-27 03:17:07

+0

這當然是最快的解決方案。超過100,000行,這比其他兩種解決方案快大約10倍。謝謝! – 2013-03-27 11:23:56

相關問題