在Haskell,ridge regression可以表示爲:嶺迴歸需要多少空間?
import Numeric.LinearAlgebra
createReadout :: Matrix Double → Matrix Double → Matrix Double
createReadout a b = oA <\> oB
where
μ = 1e-4
oA = (a <> (tr a)) + (μ * (ident $ rows a))
oB = a <> (tr b)
然而,該操作是非常昂貴的存儲器。這是一個簡約的例子,需要在我的機器上超過2GB,並需要3分鐘執行。
import Numeric.LinearAlgebra
import System.Random
createReadout :: Matrix Double -> Matrix Double -> Matrix Double
createReadout a b = oA <\> oB
where
mu = 1e-4
oA = (a <> (tr a)) + (mu * (ident $ rows a))
oB = a <> (tr b)
teacher :: [Int] -> Int -> Int -> Matrix Double
teacher labelsList cols' correctRow = fromBlocks $ f <$> labelsList
where ones = konst 1.0 (1, cols')
zeros = konst 0.0 (1, cols')
rows' = length labelsList
f i | i == correctRow = [ones]
| otherwise = [zeros]
glue :: Element t => [Matrix t] -> Matrix t
glue xs = fromBlocks [xs]
main :: IO()
main = do
let n = 1500 -- <- The constant to be increased
m = 10000
cols' = 12
g <- newStdGen
-- Stub data
let labels = take m . map (`mod` 10) . randoms $ g :: [Int]
a = (n >< (cols' * m)) $ take (cols' * m * n) $ randoms g :: Matrix Double
teachers = zipWith (teacher [0..9]) (repeat cols') labels
b = glue teachers
print $ maxElement $ createReadout a b
return()
$小集團EXEC GHC - -O2 Test.hs
$時間./Test
./Test 190.16s用戶5.22s系統106%的CPU 3:03.93總
的問題是增加恆定ñ,至少到n = 4000,而RAM由5GB限制。理論上矩陣求逆運算所需的最小空間是什麼?這個操作如何在空間上得到優化?可以用更便宜的方法有效地替代嶺迴歸?
我在讀這個權利,'a'是一個1500 x 120000矩陣? –
完全正確。它可能會更大。 – penkovsky
矩陣[稀疏](https://en.wikipedia.org/wiki/Sparse_matrix)?這可以爲你節省大量的時間和空間(但你可能需要像[共軛梯度](https://en.wikipedia.org/wiki/Conjugate_gradient_method))這樣的專用算法。 – leftaroundabout