2015-02-10 19 views
2
> matrix(c(c(0, 3.75882e-06, 3.71645e-05, 2.16088e-06, 1.357e-06, 1.19274e-06, NaN, 1.14748e-06, 9.3314e-07), c(3.75882e-06, 0, 3.94165e-05, 3.58464e-06, 3.60392e-06, 3.43881e-06, NaN, 3.39315e-06, 3.17616e-06), c(3.71645e-05, 3.94165e-05, 0, 3.78173e-05, 3.70121e-05, 3.68449e-05, NaN, 3.6798e-05, 3.65591e-05), c(2.16088e-06, 3.58464e-06, 3.78173e-05, 0, 2.00581e-06, 1.84085e-06, NaN, 1.79527e-06, 1.57976e-06), c(1.357e-06, 3.60392e-06, 3.70121e-05, 2.00581e-06, 0, 1.03709e-06, NaN, 9.91615e-07, 7.77135e-07), c(1.19274e-06, 3.43881e-06, 3.68449e-05, 1.84085e-06, 1.03709e-06, 0, NaN, 8.27333e-07, 6.12979e-07), c(NaN, NaN, NaN, NaN, NaN, NaN, 0, NaN, NaN), c(1.14748e-06, 3.39315e-06, 3.6798e-05, 1.79527e-06, 9.91615e-07, 8.27333e-07, NaN, 0, 5.67856e-07), c(9.3314e-07, 3.17616e-06, 3.65591e-05, 1.57976e-06, 7.77135e-07, 6.12979e-07, NaN, 5.67856e-07, 0)), ncol=9) 

      [,1]  [,2]  [,3]  [,4]  [,5]  [,6] [,7]  [,8]  [,9] 
[1,] 0.00000e+00 3.75882e-06 3.71645e-05 2.16088e-06 1.35700e-06 1.19274e-06 NaN 1.14748e-06 9.33140e-07 
[2,] 3.75882e-06 0.00000e+00 3.94165e-05 3.58464e-06 3.60392e-06 3.43881e-06 NaN 3.39315e-06 3.17616e-06 
[3,] 3.71645e-05 3.94165e-05 0.00000e+00 3.78173e-05 3.70121e-05 3.68449e-05 NaN 3.67980e-05 3.65591e-05 
[4,] 2.16088e-06 3.58464e-06 3.78173e-05 0.00000e+00 2.00581e-06 1.84085e-06 NaN 1.79527e-06 1.57976e-06 
[5,] 1.35700e-06 3.60392e-06 3.70121e-05 2.00581e-06 0.00000e+00 1.03709e-06 NaN 9.91615e-07 7.77135e-07 
[6,] 1.19274e-06 3.43881e-06 3.68449e-05 1.84085e-06 1.03709e-06 0.00000e+00 NaN 8.27333e-07 6.12979e-07 
[7,]   NaN   NaN   NaN   NaN   NaN   NaN 0   NaN   NaN 
[8,] 1.14748e-06 3.39315e-06 3.67980e-05 1.79527e-06 9.91615e-07 8.27333e-07 NaN 0.00000e+00 5.67856e-07 
[9,] 9.33140e-07 3.17616e-06 3.65591e-05 1.57976e-06 7.77135e-07 6.12979e-07 NaN 5.67856e-07 0.00000e+00 

我有一堆上述類型的矩陣。除了由NaN組成的特定行和列外,它們都填充了數字元素。在由NaN構成的行和列之間的交集處總是有零。請注意,在上面的例子中,只有一行和一列包含NaN,但實際上我可能有幾個這樣的行和列。如何撤回(幾乎)完全由NaN組成的行和列?

我打算編寫一個函數,自動刪除幾乎由NaN組成的行和列。我怎樣才能做到這一點?

回答

5

與rowSums和colSums(在正確的位置)邏輯索引給出了一個非常緊湊,高效的答案:

M[rowSums(is.na(M)) < 0.8*nrow(M), ][ , colSums(is.na(M))< 0.8*ncol(M)] 

      [,1]  [,2]  [,3]  [,4]  [,5] 
[1,] 0.00000e+00 3.75882e-06 3.71645e-05 2.16088e-06 1.35700e-06 
[2,] 3.75882e-06 0.00000e+00 3.94165e-05 3.58464e-06 3.60392e-06 
[3,] 3.71645e-05 3.94165e-05 0.00000e+00 3.78173e-05 3.70121e-05 
[4,] 2.16088e-06 3.58464e-06 3.78173e-05 0.00000e+00 2.00581e-06 
[5,] 1.35700e-06 3.60392e-06 3.70121e-05 2.00581e-06 0.00000e+00 
[6,] 1.19274e-06 3.43881e-06 3.68449e-05 1.84085e-06 1.03709e-06 
[7,] 1.14748e-06 3.39315e-06 3.67980e-05 1.79527e-06 9.91615e-07 
[8,] 9.33140e-07 3.17616e-06 3.65591e-05 1.57976e-06 7.77135e-07 
      [,6]  [,7]  [,8] 
[1,] 1.19274e-06 1.14748e-06 9.33140e-07 
[2,] 3.43881e-06 3.39315e-06 3.17616e-06 
[3,] 3.68449e-05 3.67980e-05 3.65591e-05 
[4,] 1.84085e-06 1.79527e-06 1.57976e-06 
[5,] 1.03709e-06 9.91615e-07 7.77135e-07 
[6,] 0.00000e+00 8.27333e-07 6.12979e-07 
[7,] 8.27333e-07 0.00000e+00 5.67856e-07 
[8,] 6.12979e-07 5.67856e-07 0.00000e+00 

甚至可以做到這一步:

M[rowSums(is.na(M)) < 0.8*nrow(M), colSums(is.na(M))< 0.8*ncol(M)] 

      [,1]  [,2]  [,3]  [,4]  [,5] 
[1,] 0.00000e+00 3.75882e-06 3.71645e-05 2.16088e-06 1.35700e-06 
[2,] 3.75882e-06 0.00000e+00 3.94165e-05 3.58464e-06 3.60392e-06 
[3,] 3.71645e-05 3.94165e-05 0.00000e+00 3.78173e-05 3.70121e-05 
[4,] 2.16088e-06 3.58464e-06 3.78173e-05 0.00000e+00 2.00581e-06 
[5,] 1.35700e-06 3.60392e-06 3.70121e-05 2.00581e-06 0.00000e+00 
[6,] 1.19274e-06 3.43881e-06 3.68449e-05 1.84085e-06 1.03709e-06 
[7,] 1.14748e-06 3.39315e-06 3.67980e-05 1.79527e-06 9.91615e-07 
[8,] 9.33140e-07 3.17616e-06 3.65591e-05 1.57976e-06 7.77135e-07 
      [,6]  [,7]  [,8] 
[1,] 1.19274e-06 1.14748e-06 9.33140e-07 
[2,] 3.43881e-06 3.39315e-06 3.17616e-06 
[3,] 3.68449e-05 3.67980e-05 3.65591e-05 
[4,] 1.84085e-06 1.79527e-06 1.57976e-06 
[5,] 1.03709e-06 9.91615e-07 7.77135e-07 
[6,] 0.00000e+00 8.27333e-07 6.12979e-07 
[7,] 8.27333e-07 0.00000e+00 5.67856e-07 
[8,] 6.12979e-07 5.67856e-07 0.00000e+00 

如果你是確定只有一個行或列的數量少於那麼邏輯測試可能是<= (nrow(M)-1)<= (ncol(M)-1)

+0

很好的解決方案。謝謝。 – Sulawesi 2015-02-10 03:16:13

相關問題