2010-01-17 201 views
0

我想對我的mysql 5.0數據庫(結果值字段只有2個可能的值,1和0)中的數據做多變量(9變量)線性迴歸。Mysql多變量線性迴歸

我已經做了一些搜索,發現我可以使用:

mysql> SELECT 
    -> @n := COUNT(score) AS N, 
    -> @meanX := AVG(age) AS "X mean", 
    -> @sumX := SUM(age) AS "X sum", 
    -> @sumXX := SUM(age*age) "X sum of squares", 
    -> @meanY := AVG(score) AS "Y mean", 
    -> @sumY := SUM(score) AS "Y sum", 
    -> @sumYY := SUM(score*score) "Y sum of square", 
    -> @sumXY := SUM(age*score) AS "X*Y sum" 

要獲取很多基本的迴歸變量,但我真的不想打出來做這行的每個組合9個變量。我可以找到關於如何對多變量進行迴歸的所有資源,需要矩陣操作。我可以使用mysql進行Matrix操作,還是有其他方法可以執行9次變量線性迴歸?

我應該先從mysql導出數據嗎?它的~80,000行,所以它可以移動它,只是不知道我應該使用什麼。

感謝, 丹

回答

1

這是好事,存儲在MySQL這個數據,但你可以從能夠訪問到數據庫的語言處理數據。僞代碼:

variables = [ 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' ]; 

for X in $variables do 
    for Y in $variables do 
     query = 'SELECT 
      @'+$X+$Y+' := COUNT(score) AS '+$X+$Y+', 
      @mean'+$X+' := AVG(age) AS "X mean", 
      @sum'+$X+' := SUM(age) AS "X sum", 
      @sum'+$X+$X+' := SUM(age*age) "X sum of squares", 
      @mean'+$Y+' := AVG(score) AS "Y mean", 
      @sum'+$Y+' := SUM(score) AS "Y sum", 
      @sum'+$Y+$Y+' := SUM(score*score) "Y sum of square", 
      @sum'+$X+$Y+' := SUM(age*score) AS "X*Y sum"'; 
     db_execute(query); 
    done 
done 

但爲什麼不將結果存儲在表中?更適合數據庫。

for X in $variables do 
    for Y in $variables do 
     query = 'INSERT INTO regression SELECT FROM measurements 
      "'+$X+'" AS X 
      "'+$Y+'" AS Y 
      score AS valX 
      age AS valY 
      COUNT(score) AS N, 
      AVG(age) AS meanX, 
      SUM(age) AS sumX, 
      SUM(age*age) squareX, 
      AVG(score) AS meanY, 
      SUM(score) AS sumY, 
      SUM(score*score) squareY, 
      SUM(age*score) AS sumXY'; 
     db_execute(query); 
    done 
done 

將單獨的索引放在X和Y列上。