正常化列-1和1之間的矩陣

我有一個大的矩陣（成千上萬的行和數百行），我想正常列-1至1之間。這是我寫的代碼：正常化列-1和1之間的矩陣

normalize <- function(x) { 
    for(j in 1:length(x[1,])){ 
     print(j) 
     min <- min(x[,j]) 
     max <- max(x[,j]) 
     for(i in 1:length(x[,j])){ 
      x[i,j] <- 2 * (x[i,j] - min)/(max - min) - 1 
     } 
    } 
    return(x) 
}

不幸的是，它可以放慢速度。我已經看到了這一點：

normalize <- function(x) { 
    x <- sweep(x, 2, apply(x, 2, min)) 
    sweep(x, 2, apply(x, 2, max), "/") 
}

它速度快，但它在0和1之間的歸你能幫我請修改它爲我的目的是什麼？我很抱歉，但我一開始是學習[R

來源

2013-01-11 endamaco

首先編寫一些測試，以確保您的慢代碼能夠提供正確的答案。然後，你可以檢查所有「解決方案」即將被濺到所以做你想做的。寫作測試很有趣！ – Spacedman

如何在自己的函數結束時重新縮放矩陣x？

normalize <- function(x) { 
    x <- sweep(x, 2, apply(x, 2, min)) 
    x <- sweep(x, 2, apply(x, 2, max), "/") 
    2*x - 1 
}

來源

2013-01-11 16:37:14

謝謝..使它成爲 – endamaco

如何只：

x[,1] <- (x[,1]-mean(x[,1]))/(max(x[,1])-min(x[,1]))

大多數R基本功能矢量化，因此沒有必要在你的代碼for循環。此片段將縮放第1列的所有內容（您也可以使用功能scale()，但它沒有最小/最大值的選項）。

做一個整個數據集，你可以做這樣的事情：

Scale <- function(y) y <- (y-mean(y))/(max(y)-min(y)) 
DataFrame.Scaled <- apply(DataFrame, 2, Scale)

編輯：這也是值得指出的是，你不要命名的功能後的值。當您做min <- min(x)時，下次您詢問min時，會導致R的混淆。

來源

2013-01-11 16:30:40

重新編輯：一般很好的建議，但在這種情況下不會造成問題; 'min（foo）'只會匹配一個功能對象'min'。 –

這將使用相同的方法

normalize <- function(x) { 
    x <- sweep(x, 2, apply(x, 2, mean))   # retrive the mean from each column 
    2* sweep(x, 2, apply(x, 2, function(y) max(y)-min(y)), "/") 
}

}

編輯

使用colMeans重新縮放矩陣作爲意見提出當然更快

normalize <- function(x) { 
    aa <- colMeans(x) 
    x <- sweep(x, 2, aa)   # retrive the mean from each column 

    2* sweep(x, 2, apply(x, 2, function(y) max(y)-min(y)), "/") 
} 
A <- matrix(1:24, ncol=3) 

> normalize(A) 
      [,1]  [,2]  [,3] 
[1,] -1.0000000 -1.0000000 -1.0000000 
[2,] -0.7142857 -0.7142857 -0.7142857 
[3,] -0.4285714 -0.4285714 -0.4285714 
[4,] -0.1428571 -0.1428571 -0.1428571 
[5,] 0.1428571 0.1428571 0.1428571 
[6,] 0.4285714 0.4285714 0.4285714 
[7,] 0.7142857 0.7142857 0.7142857 
[8,] 1.0000000 1.0000000 1.0000000

EDIT與基座包

scale(A,center=TRUE,scale=apply(A,2,function(x) 0.5*(max(x)-min(x)))) 
      [,1]  [,2]  [,3] 
[1,] -1.0000000 -1.0000000 -1.0000000 
[2,] -0.7142857 -0.7142857 -0.7142857 
[3,] -0.4285714 -0.4285714 -0.4285714 
[4,] -0.1428571 -0.1428571 -0.1428571 
[5,] 0.1428571 0.1428571 0.1428571 
[6,] 0.4285714 0.4285714 0.4285714 
[7,] 0.7142857 0.7142857 0.7142857 
[8,] 1.0000000 1.0000000 1.0000000

來源

2013-01-11 16:39:43 agstudy

'colMeans（）'會比第一個'apply（）'調用更快。 –

@Gavin辛普森是的！你是對的！我正在尋找如何在範例包裝中完成這項工作。 – agstudy

@GavinSimpson我認爲最好的解決方案是與尺度包:) – agstudy

基準的scale功能：

normalize2 <- function(A) { 
    scale(A,center=TRUE,scale=apply(A,2,function(x) 0.5*(max(x)-min(x)))) 
} 

normalize3 <- function(mat) { 
    apply(mat,2,function(x) {xmin <- min(x); 2*(x-xmin)/(max(x)-xmin)-1}) 
} 

normalize4 <- function(x) { 
    aa <- colMeans(x) 
    x <- sweep(x, 2, aa)   # retrive the mean from each column 

    2* sweep(x, 2, apply(x, 2, function(y) max(y)-min(y)), "/") 
} 


set.seed(42) 
mat <- matrix(sample(1:10,1e5,TRUE),1e3) 
erg2 <- normalize2(mat) 
attributes(erg2) <- attributes(normalize3(mat)) 
all.equal( 
    erg2, 
    normalize3(mat), 
    normalize4(mat) 
) 

[1] TRUE 

library(microbenchmark) 
microbenchmark(normalize4(mat),normalize3(mat),normalize2(mat)) 

Unit: milliseconds 
      expr  min  lq median  uq  max 
1 normalize2(mat) 4.846551 5.486845 5.597799 5.861976 30.46634 
2 normalize3(mat) 4.191677 4.862655 4.980571 5.153438 28.94257 
3 normalize4(mat) 4.960790 5.648666 5.766207 5.972404 30.08334 

set.seed(42) 
mat <- matrix(sample(1:10,1e4,TRUE),10) 
microbenchmark(normalize4(mat),normalize3(mat),normalize2(mat)) 

Unit: milliseconds 
      expr  min  lq median  uq  max 
1 normalize2(mat) 4.319131 4.445384 4.556756 4.821512 9.116263 
2 normalize3(mat) 5.743305 5.927829 6.098392 6.454875 13.439526 
3 normalize4(mat) 3.955712 4.102306 4.175394 4.402710 5.773221

的apply溶液稍慢如果列的數目是小的，但是稍微更快，如果數的列很大。總的來說，表現是相同的。

來源

2013-01-11 17:35:47 Roland

+1進行基準測試。 – agstudy

正常化列-1和1之間的矩陣

回答

相關問題