2013-07-03 42 views
16

具有雙值的兩個陣列,我想計算相關係數(單,雙值,就像在MS Excel中CORREL功能)。 C#中有一些簡單的單行解決方案嗎?相關的兩個陣列的

我已經發現了稱爲元Numerics的數學庫。根據this SO question,它應該完成這項工作。 Here是Meta Numerics相關方法的文檔,我不明白。

能請別人爲我提供了簡單的代碼片段或例子,如何使用圖書館?

注意:最後,我不得不使用自定義實現之一。 但如果有人讀這個問題知道好,有據可查的C# 數學庫/框架,要做到這一點,請不要猶豫,在 答案張貼鏈接。

+1

這也許可以幫助你也http://www.codeproject.com/Articles/8750/A-computational-statistics這是用於相關係數的代碼http://www.functionx.com/vcsharp/applications/lcc.htm – terrybozzio

+0

有一個來自http://ta-lib.org/的庫,它有「CORREL」函數。它非常易於使用,並且可以爲您提供與excel相同的結果。它像Excel一樣返回結果數組而不是單個值。 –

回答

25

您可以在同一指數在單獨的列表中的值,並使用一個簡單的Zip

var fitResult = new FitResult(); 
var values1 = new List<int>(); 
var values2 = new List<int>(); 

var correls = values1.Zip(values2, (v1, v2) => 
             fitResult.CorrelationCoefficient(v1, v2)); 

第二種方式是寫自己的定製實現(我是不是速度優化):

public double ComputeCoeff(double[] values1, double[] values2) 
{ 
    if(values1.Length != values2.Length) 
     throw new ArgumentException("values must be the same length"); 

    var avg1 = values1.Average(); 
    var avg2 = values2.Average(); 

    var sum1 = values1.Zip(values2, (x1, y1) => (x1 - avg1) * (y1 - avg2)).Sum(); 

    var sumSqr1 = values1.Sum(x => Math.Pow((x - avg1), 2.0)); 
    var sumSqr2 = values2.Sum(y => Math.Pow((y - avg2), 2.0)); 

    var result = sum1/Math.Sqrt(sumSqr1 * sumSqr2); 

    return result; 
} 

用法:

var values1 = new List<double> { 3, 2, 4, 5 ,6 }; 
var values2 = new List<double> { 9, 7, 12 ,15, 17 }; 

var result = ComputeCoeff(values1.ToArray(), values2.ToArray()); 
// 0.997054485501581 

Debug.Assert(result.ToString("F6") == "0.997054"); 

另一種方法是使用Excel直接功能:

var values1 = new List<double> { 3, 2, 4, 5 ,6 }; 
var values2 = new List<double> { 9, 7, 12 ,15, 17 }; 

// Make sure to add a reference to Microsoft.Office.Interop.Excel.dll 
// and use the namespace 

var application = new Application(); 

var worksheetFunction = application.WorksheetFunction; 

var result = worksheetFunction.Correl(values1.ToArray(), values2.ToArray()); 

Console.Write(result); // 0.997054485501581 
+0

+1感謝您提供代碼示例,並闡明瞭庫的工作原理!問題是它只適用於ints而不是double的數組。當然不是你的錯,但我不能標記爲答案。 – teejay

+0

是的,我沒有看到參數是'int'類型。如果您需要使用雙打,那麼您可能需要爲它編寫自己的擴展方法。 – Romoku

+0

如果你看看這個類的[source](http://metanumerics.codeplex.com/SourceControl/latest#Numerics/Core/Statistics/FitResult.cs),你會發現它使用矩陣來計算相關性係數,所以你可以模仿它。 – Romoku

5

如果您不想使用第三方庫,您可以使用this post中的方法(在此處發佈代碼進行備份)。

double[] array1 = { 3, 2, 4, 5, 6 }; 
double[] array2 = { 9, 7, 12, 15, 17 }; 

double correl = Correlation(array1, array2); 

public double Correlation(double array1, double array2) 
{ 
    double[] array_xy = new double[array1.Length]; 
    double[] array_xp2 = new double[array1.Length]; 
    double[] array_yp2 = new double[array1.Length]; 
    for (int i = 0; i &lt; array1.Length; i++) 
     array_xy[i] = array1[i] * array2[i]; 
    for (int i = 0; i &lt; array1.Length; i++) 
     array_xp2[i] = Math.Pow(array1[i], 2.0); 
    for (int i = 0; i &lt; array1.Length; i++) 
     array_yp2[i] = Math.Pow(array2[i], 2.0); 
    double sum_x = 0; 
    double sum_y = 0; 
    foreach (double n in array1) 
     sum_x += n; 
    foreach (double n in array2) 
     sum_y += n; 
    double sum_xy = 0; 
    foreach (double n in array_xy) 
     sum_xy += n; 
    double sum_xpow2 = 0; 
    foreach (double n in array_xp2) 
     sum_xpow2 += n; 
    double sum_ypow2 = 0; 
    foreach (double n in array_yp2) 
     sum_ypow2 += n; 
    double Ex2 = Math.Pow(sum_x, 2.00); 
    double Ey2 = Math.Pow(sum_y, 2.00); 

    return (array1.Length * sum_xy - sum_x * sum_y)/
    Math.Sqrt((array1.Length * sum_xpow2 - Ex2) * (array1.Length * sum_ypow2 - Ey2)); 
} 
7

爲了計算皮爾遜積矩相關係數

http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

你可以使用這個簡單的代碼:

public static Double Correlation(Double[] Xs, Double[] Ys) { 
    Double sumX = 0; 
    Double sumX2 = 0; 
    Double sumY = 0; 
    Double sumY2 = 0; 
    Double sumXY = 0; 

    int n = Xs.Length < Ys.Length ? Xs.Length : Ys.Length; 

    for (int i = 0; i < n; ++i) { 
     Double x = Xs[i]; 
     Double y = Ys[i]; 

     sumX += x; 
     sumX2 += x * x; 
     sumY += y; 
     sumY2 += y * y; 
     sumXY += x * y; 
    } 

    Double stdX = Math.Sqrt(sumX2/n - sumX * sumX/n/n); 
    Double stdY = Math.Sqrt(sumY2/n - sumY * sumY/n/n); 
    Double covariance = (sumXY/n - sumX * sumY/n/n); 

    return covariance/stdX/stdY; 
    } 
15

Math.NET Numerics的是一個包含相關類證據充分的數學庫。它計算皮爾森和斯皮爾曼排名的相關性:http://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Correlation.htm

該庫可下的非常寬鬆的MIT/X11許可。使用它來計算相關係數非常簡單,如下所示:

using MathNet.Numerics.Statistics; 

... 

correlation = Correlation.Pearson(arrayOfValues1, arrayOfValues2); 

祝你好運!

+0

感謝您的鏈接!這可能實際上是迄今爲止最好的庫,方法的使用真的不會更容易:-) – teejay

+0

作爲一個更新,Math.NET Numerics 3.5版添加了一種方法來相關類來計算加權皮爾遜相關性。 –

0

在我的測試中,@Dmitry Bychenko和@ keyboardP的上述代碼發佈通常與Microsoft Excel通過幾次手動測試產生相同的相關性,並且不需要任何外部庫。

例如運行此一次(數據爲這個運行在底部列出):

@Dmitry Bychenko:-0。00418479432051121

@keyboardP:______- 0.00418479432051131

MS Excel中:_________- 0.004184794

下面是測試線束:

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 

namespace TestCorrel { 
    class Program { 

     static void Main(string[] args) { 

      Random rand = new Random(DateTime.Now.Millisecond); 

      List<double> x = new List<double>(); 
      List<double> y = new List<double>(); 

      for (int i = 0; i < 100; i++) { 

       x.Add(rand.Next(1000) * rand.NextDouble()); 
       y.Add(rand.Next(1000) * rand.NextDouble()); 

       Console.WriteLine(x[i] + "," + y[i]); 
      } 

      Console.WriteLine("Correl1: " + Correl1(x, y)); 
      Console.WriteLine("Correl2: " + Correl2(x, y)); 
     } 

     public static double Correl1(List<double> x, List<double> y) { 

      //https://stackoverflow.com/questions/17447817/correlation-of-two-arrays-in-c-sharp 
      if (x.Count != y.Count) 
       return (double.NaN); //throw new ArgumentException("values must be the same length"); 

      double sumX = 0; 
      double sumX2 = 0; 
      double sumY = 0; 
      double sumY2 = 0; 
      double sumXY = 0; 

      int n = x.Count < y.Count ? x.Count : y.Count; 

      for (int i = 0; i < n; ++i) { 

       Double xval = x[i]; 
       Double yval = y[i]; 

       sumX += xval; 
       sumX2 += xval * xval; 
       sumY += yval; 
       sumY2 += yval * yval; 
       sumXY += xval * yval; 
      } 

      Double stdX = Math.Sqrt(sumX2/n - sumX * sumX/n/n); 
      Double stdY = Math.Sqrt(sumY2/n - sumY * sumY/n/n); 
      Double covariance = (sumXY/n - sumX * sumY/n/n); 

      return covariance/stdX/stdY; 
     } 

     public static double Correl2(List<double> x, List<double> y) { 

      double[] array_xy = new double[x.Count]; 
      double[] array_xp2 = new double[x.Count]; 
      double[] array_yp2 = new double[x.Count]; 

      for (int i = 0; i < x.Count; i++) 
       array_xy[i] = x[i] * y[i]; 
      for (int i = 0; i < x.Count; i++) 
       array_xp2[i] = Math.Pow(x[i], 2.0); 
      for (int i = 0; i < x.Count; i++) 
       array_yp2[i] = Math.Pow(y[i], 2.0); 
      double sum_x = 0; 
      double sum_y = 0; 
      foreach (double n in x) 
       sum_x += n; 
      foreach (double n in y) 
       sum_y += n; 
      double sum_xy = 0; 
      foreach (double n in array_xy) 
       sum_xy += n; 
      double sum_xpow2 = 0; 
      foreach (double n in array_xp2) 
       sum_xpow2 += n; 
      double sum_ypow2 = 0; 
      foreach (double n in array_yp2) 
       sum_ypow2 += n; 
      double Ex2 = Math.Pow(sum_x, 2.00); 
      double Ey2 = Math.Pow(sum_y, 2.00); 

      double Correl = 
      (x.Count * sum_xy - sum_x * sum_y)/
      Math.Sqrt((x.Count * sum_xpow2 - Ex2) * (x.Count * sum_ypow2 - Ey2)); 

      return (Correl); 
     } 
    } 
} 

數據爲上述的實施例號:

287.688269702572,225.610842817282 
618.9313498167,177.955550192835 
25.7778882802361,27.6549569366756 
140.847984766051,714.618547504125 
438.618761728806,533.48764902702 
481.347431274758,214.381256273194 
21.6406916848573,393.559209519792 
135.30397563209,158.419851317732 
334.314685154853,814.275162949821 
764.614904770914,50.1435267264692 
42.8179292282173,47.8631582287434 
237.216836650491,370.488416981179 
388.849658539449,134.961087643151 
305.903013161804,441.926902444068 
10.6625048679591,369.567569480076 
36.9316453891488,24.8947204607049 
2.10067253471383,491.941975629861 
7.94887068492774,573.037801189831 
341.738006353722,653.497146697015 
98.8424873439793,475.215988045193 
272.248712629196,36.1088809138671 
122.336823399801,169.158256422336 
9.32281673202422,631.076001565473 
201.118425176068,803.724831627554 
415.514343714115,64.248651454341 
227.791637123,230.512133914284 
25.3438658925443,396.854282886188 
596.238994411304,72.543763144195 
230.239735877253,933.983901697669 
796.060099040186,689.952468971234 
9.30882684202344,269.22063744125 
16.5005430148451,8.96549091859045 
536.324005148524,358.829873788557 
519.694526420764,17.3212184707267 
552.628357889423,12.5541588051962 
210.516099897454,388.57537739937 
141.341571405689,268.082028986924 
503.880356335491,753.447006912645 
515.494990213539,444.451280259737 
973.8670776076,168.922799013985 
85.7111146094795,36.3784999169309 
37.2147129193017,108.040356312432 
504.590177939548,50.3934166889607 
482.821039277511,888.984586256083 
5.52549206350255,156.717087003271 
405.833169031345,394.099059180868 
459.249365587835,11.68776424494 
429.421127440604,314.216759666901 
126.908422469584,331.907062556551 
62.1416232716952,3.19765723645578 
4.16058817699579,604.04046284223 
484.262182311277,220.177370167886 
58.6774453314382,339.09660232677 
463.482149892246,199.181594849183 
344.128297473829,268.531428258182 
0.883430369609702,209.346384477963 
77.9462970131758,255.221325168955 
583.629439312792,235.557751925922 
358.409186083083,376.046612200349 
81.2148325150902,10.7696774717279 
53.7315618049966,274.171515094196 
111.284646992239,130.174321939319 
317.280491961763,338.077288461885 
177.454564264722,7.53587801919127 
69.2239431670047,233.693477620228 
823.419546454875,0.111916855029723 
23.7174749401014,200.989081544331 
44.9598299125022,102.633862571155 
74.1602278468945,292.485449988155 
130.11182449251,23.4682153367755 
243.088760058903,335.807090202722 
13.3974915991526,436.983231269281 
73.3900805168739,252.352352472186 
592.144630201228,92.3395205570103 
57.7306153447044,47.1416798900541 
522.649018382024,584.427794722108 
15.3662010204821,60.1693953262499 
16.8335716728277,851.401980430541 
33.9869734449251,0.930781653584345 
116.66608504982,146.126050951949 
92.8896130355492,711.765618208687 
317.91980889529,322.186540377413 
44.8574470732629,209.275617858058 
751.201537871362,37.935519233316 
161.817758424588,2.83156183493862 
531.64078452142,79.1750782491523 
114.803219681048,283.106988439852 
123.472725123853,154.125248027558 
89.9276725453919,63.4626924192825 
105.623296753328,111.234188702067 
435.72981759707,23.7058234576629 
259.324810619152,69.3535200857341 
719.885234421531,381.086239833891 
24.2674900099018,198.408173349876 
57.7761600361095,146.52277489124 
77.4594609157459,710.746080866431 
636.671781979814,538.894185951396 
56.6035279932448,58.2563265684323 
485.16099039333,427.849954283261 
91.9552873247095,576.92944263617