2016-11-18 29 views
1

我正在嘗試在C#中執行偏最小二乘迴歸分析。在MATLAB中執行的pls技術使用提供β(迴歸係數矩陣)的SIMPLS算法。MATLAB和C#中的PLS迴歸係數(Accord.NET)

  • 我不明白爲什麼矩陣在兩種情況下都不同,在將輸入傳遞給C#版本的方式中是否存在一些錯誤?

  • 另外,輸入對於兩者都是相同的,並且參照這裏包括的論文。

最小工作示例

MATLAB:以下通過埃爾韋阿卜迪的小例子(埃爾韋阿卜迪,偏最小二乘迴歸)。參考文獻:PDF

clear all; 
clc; 
inputs = [7, 7, 13, 7; 4, 3, 14, 7; 10, 5, 12, 5; 16, 7, 11, 3; 13, 3, 10, 3]; 
outputs = [14, 7, 8; 10, 7, 6; 8, 5, 5; 2, 4,7; 6, 2, 4]; 
[XL,yl,XS,YS,beta,PCTVAR] = plsregress(inputs,outputs, 1); 
disp 'beta' 
beta 
disp 'beta size' 
size(beta) 
yfit = [ones(size(inputs,1),1) inputs]*beta; 
residuals = outputs - yfit; 

% stem(residuals) 
% xlabel('Observation'); 
% ylabel('Residual'); 

beta = 

    1.0484e+01 6.1899e+00 6.2841e+00 
    -6.3488e-01 -3.0405e-01 -7.2608e-02 
    2.1949e-02 1.0512e-02 2.5102e-03 
    1.9226e-01 9.2078e-02 2.1988e-02 
    2.8948e-01 1.3864e-01 3.3107e-02 

Accord.NET:

double[][] inputs = new double[][] 
    { 
     //  Wine | Price | Sugar | Alcohol | Acidity 
     new double[] { 7,  7,  13,  7 }, 
     new double[] { 4,  3,  14,  7 }, 
     new double[] { 10,  5,  12,  5 }, 
     new double[] { 16,  7,  11,  3 }, 
     new double[] { 13,  3,  10,  3 }, 
    }; 

double[][] outputs = new double[][] 
    { 
     //    Wine | Hedonic | Goes with meat | Goes with dessert 
     new double[] {   14,   7,     8 }, 
     new double[] {   10,   7,     6 }, 
     new double[] {   8,   5,     5 }, 
     new double[] {   2,   4,     7 }, 
     new double[] {   6,   2,     4 }, 
    }; 

var pls = new PartialLeastSquaresAnalysis() 
     { 
      Method = AnalysisMethod.Center, 
      Algorithm = PartialLeastSquaresAlgorithm.NIPALS 
     }; 

var regression = pls.Learn(inputs, outputs); 

double[][] coeffs = regression.Weights; 
>> 
-1.69811320754717 -0.0566037735849056 0.0707547169811322 
1.27358490566038 0.29245283018868  0.571933962264151 
-4     1     0.5 
1.17924528301887 0.122641509433962 0.159198113207547 

回答

0

我認爲是MATLAB和Accord.NET PLS的版本正在調用方式之間至少有三個差異。

  1. 正如你所說,MATLAB正在使用SIMPLS。但是,Accord.NET被告知使用NIPALS。

  2. 的MATLAB版本被稱爲plsregress(輸入,輸出,),這意味着迴歸被計算考慮PLS只有1潛成分,但你Accord.NET沒有指示做一樣。

  3. Accord.NET返回MultivariateLinearRegression對象,該對象既包含權重矩陣又包含截距矢量,而MATLAB將截距作爲權重矩陣的第一列進行返回。

一旦所有這些都採取了在考慮,有可能產生準確的結果相同的MATLAB版本:

double[][] inputs = new double[][] 
{ 
    //  Wine | Price | Sugar | Alcohol | Acidity 
    new double[] { 7,  7,  13,  7 }, 
    new double[] { 4,  3,  14,  7 }, 
    new double[] { 10,  5,  12,  5 }, 
    new double[] { 16,  7,  11,  3 }, 
    new double[] { 13,  3,  10,  3 }, 
}; 

double[][] outputs = new double[][] 
{ 
    //    Wine | Hedonic | Goes with meat | Goes with dessert 
    new double[] {   14,   7,     8 }, 
    new double[] {   10,   7,     6 }, 
    new double[] {   8,   5,     5 }, 
    new double[] {   2,   4,     7 }, 
    new double[] {   6,   2,     4 }, 
}; 

// Create the Partial Least Squares Analysis 
var pls = new PartialLeastSquaresAnalysis() 
{ 
    Method = AnalysisMethod.Center, 
    Algorithm = PartialLeastSquaresAlgorithm.SIMPLS, // First change: use SIMPLS 
}; 

// Learn the analysis 
pls.Learn(inputs, outputs); 

// Second change: Use just 1 latent factor/component 
var regression = pls.CreateRegression(factors: 1); 

// Third change: present results as in MATLAB 
double[][] w = regression.Weights.Transpose(); 
double[] b = regression.Intercepts; 

// Add the intercepts as the first column of the matrix of 
// weights and transpose it as in the way MATLAB presents it 
double[][] coeffs = (w.InsertColumn(b, index: 0)).Transpose(); 

// Show results in MATLAB format 
string str = coeffs.ToOctave(); 

這些變化,上面的coeffs矩陣應該成爲

[ 10.4844779770616 6.18986077674717 6.28413863347486 ; 
    -0.634878923091644 -0.304054829845448 -0.0726082626993539 ; 
    0.0219492754418065 0.0105118991463605 0.00251024045589416 ; 
    0.192261724966225 0.0920775662006966 0.0219881135215502 ; 
    0.289484835410222 0.13863944631343 0.033107085796122 ]