2012-07-25 106 views
2

我有一個大矩陣,〜300行和200000列。我想通過選擇至少有一個值大於0.5或小於-0.5的整列(而不僅僅是特定值)來縮小這個範圍。我想保留行和列的名稱。通過執行tmp<-mymat > 0.5 | mymat < -0.5,我能夠得到真假的矩陣。我想提取其中至少有一個TRUE的所有列。我只是嘗試了mymat[tmp],但這只是返回符合該條件的值的向量。我怎樣才能得到原始矩陣的實際列?謝謝。R選擇整個列,其中至少有一個值滿足條件

回答

6

試試這個:

> set.seed(007) # for the example being reproducible 
> X <- matrix(rnorm(100), 20) # generating some data 
> X <- cbind(X, runif(20, max=.48)) # generating a column with all values < 0.5 
> colnames(X) <- paste('col', 1:ncol(X), sep='') # some column names 
> X # this is how the matrix looks like 
       col1  col2   col3  col4   col5  col6 
[1,] 2.287247161 0.83975036 1.218550535 0.07637147 0.342585350 0.335107187 
[2,] -1.196771682 0.70534183 -0.699317079 0.15915528 0.004248236 0.419502015 
[3,] -0.694292510 1.30596472 -0.285432752 0.54367418 0.029219842 0.346358090 
[4,] -0.412292951 -1.38799622 -1.311552673 0.70480735 -0.393423429 0.212185020 
[5,] -0.970673341 1.27291686 -0.391012431 0.31896914 -0.792704563 0.224824248 
[6,] -0.947279945 0.18419277 -0.401526613 1.10924979 -0.311701865 0.415837389 
[7,] 0.748139340 0.75227990 1.350517581 0.76915419 -0.346068592 0.057660111 
[8,] -0.116955226 0.59174505 0.591190027 1.15347367 -0.304607588 0.007812921 
[9,] 0.152657626 -0.98305260 0.100525456 1.26068350 -1.785893487 0.298192099 
[10,] 2.189978107 -0.27606396 0.931071996 0.70062351 0.587274672 0.216225091 
[11,] 0.356986230 -0.87085102 -0.262742349 0.43262716 1.635794434 0.026097800 
[12,] 2.716751783 0.71871055 -0.007668105 -0.92260172 -0.645423474 0.190567072 
[13,] 2.281451926 0.11065288 0.367153007 -0.61558421 0.618992169 0.402829397 
[14,] 0.324020540 -0.07846677 1.707162545 -0.86665969 0.236393598 0.248196976 
[15,] 1.896067067 -0.42049046 0.723740263 -1.63951709 0.846500899 0.406511129 
[16,] 0.467680511 -0.56212588 0.481036049 -1.32583924 -0.573645739 0.162457572 
[17,] -0.893800723 0.99751344 -1.567868244 -0.88903673 1.117993204 0.383801555 
[18,] -0.307328300 -1.10513006 0.318250283 -0.55760233 -1.540001132 0.347037954 
[19,] -0.004822422 -0.14228783 0.165991451 -0.06240231 -0.438123899 0.262938992 
[20,] 0.988164149 0.31499490 -0.899907630 2.42269298 -0.150672971 0.139233120 
> 
> # defining a index for selecting if the condition is met 
> ind <- apply(X, 2, function(X) any(abs(X)>0.5)) 
> X[,ind] # since col6 only has values less than 0.5 it is not taken 
       col1  col2   col3  col4   col5 
[1,] 2.287247161 0.83975036 1.218550535 0.07637147 0.342585350 
[2,] -1.196771682 0.70534183 -0.699317079 0.15915528 0.004248236 
[3,] -0.694292510 1.30596472 -0.285432752 0.54367418 0.029219842 
[4,] -0.412292951 -1.38799622 -1.311552673 0.70480735 -0.393423429 
[5,] -0.970673341 1.27291686 -0.391012431 0.31896914 -0.792704563 
[6,] -0.947279945 0.18419277 -0.401526613 1.10924979 -0.311701865 
[7,] 0.748139340 0.75227990 1.350517581 0.76915419 -0.346068592 
[8,] -0.116955226 0.59174505 0.591190027 1.15347367 -0.304607588 
[9,] 0.152657626 -0.98305260 0.100525456 1.26068350 -1.785893487 
[10,] 2.189978107 -0.27606396 0.931071996 0.70062351 0.587274672 
[11,] 0.356986230 -0.87085102 -0.262742349 0.43262716 1.635794434 
[12,] 2.716751783 0.71871055 -0.007668105 -0.92260172 -0.645423474 
[13,] 2.281451926 0.11065288 0.367153007 -0.61558421 0.618992169 
[14,] 0.324020540 -0.07846677 1.707162545 -0.86665969 0.236393598 
[15,] 1.896067067 -0.42049046 0.723740263 -1.63951709 0.846500899 
[16,] 0.467680511 -0.56212588 0.481036049 -1.32583924 -0.573645739 
[17,] -0.893800723 0.99751344 -1.567868244 -0.88903673 1.117993204 
[18,] -0.307328300 -1.10513006 0.318250283 -0.55760233 -1.540001132 
[19,] -0.004822422 -0.14228783 0.165991451 -0.06240231 -0.438123899 
[20,] 0.988164149 0.31499490 -0.899907630 2.42269298 -0.150672971 

# It could be done just in one step avoiding 'ind' 
X[, apply(X, 2, function(X) any(abs(X)>0.5))] 
+0

謝謝,這工作! – helicase 2012-07-25 16:16:51

1

的除了Jilber的答案的情況下,當只有一列仍然是過濾後:

X[, apply(X, 2, function(X) any(abs(X)>0.5)), drop=FALSE] 

沒有下降= FLASE參數其餘列將被轉換爲一個向量,你將失去列名信息。

相關問題