2013-10-06 27 views
3

說我有一個類似的值的數組:如何從PHP中的數組中獲取框圖關鍵數字?

$values = array(48,30,97,61,34,40,51,33,1); 

而且我希望這些值能夠繪製箱線圖類似如下:

$box_plot_values = array(
    'lower_outlier' => 1, 
    'min'   => 8, 
    'q1'    => 32, 
    'median'   => 40, 
    'q3'    => 56, 
    'max'   => 80, 
    'higher_outlier' => 97, 
); 

我將如何在PHP中做到這一點?

回答

5
function box_plot_values($array) 
{ 
    $return = array(
     'lower_outlier' => 0, 
     'min'   => 0, 
     'q1'    => 0, 
     'median'   => 0, 
     'q3'    => 0, 
     'max'   => 0, 
     'higher_outlier' => 0, 
    ); 

    $array_count = count($array); 
    sort($array, SORT_NUMERIC); 

    $return['min']   = $array[0]; 
    $return['lower_outlier'] = $return['min']; 
    $return['max']   = $array[$array_count - 1]; 
    $return['higher_outlier'] = $return['max']; 
    $middle_index    = floor($array_count/2); 
    $return['median']   = $array[$middle_index]; // Assume an odd # of items 
    $lower_values    = array(); 
    $higher_values   = array(); 

    // If we have an even number of values, we need some special rules 
    if ($array_count % 2 == 0) 
    { 
     // Handle the even case by averaging the middle 2 items 
     $return['median'] = round(($return['median'] + $array[$middle_index - 1])/2); 

     foreach ($array as $idx => $value) 
     { 
      if ($idx < ($middle_index - 1)) $lower_values[] = $value; // We need to remove both of the values we used for the median from the lower values 
      elseif ($idx > $middle_index) $higher_values[] = $value; 
     } 
    } 
    else 
    { 
     foreach ($array as $idx => $value) 
     { 
      if ($idx < $middle_index)  $lower_values[] = $value; 
      elseif ($idx > $middle_index) $higher_values[] = $value; 
     } 
    } 

    $lower_values_count = count($lower_values); 
    $lower_middle_index = floor($lower_values_count/2); 
    $return['q1']  = $lower_values[$lower_middle_index]; 
    if ($lower_values_count % 2 == 0) 
     $return['q1'] = round(($return['q1'] + $lower_values[$lower_middle_index - 1])/2); 

    $higher_values_count = count($higher_values); 
    $higher_middle_index = floor($higher_values_count/2); 
    $return['q3']  = $higher_values[$higher_middle_index]; 
    if ($higher_values_count % 2 == 0) 
     $return['q3'] = round(($return['q3'] + $higher_values[$higher_middle_index - 1])/2); 

    // Check if min and max should be capped 
    $iqr = $return['q3'] - $return['q1']; // Calculate the Inner Quartile Range (iqr) 
    if ($return['q1'] > $iqr)     $return['min'] = $return['q1'] - $iqr; 
    if ($return['max'] - $return['q3'] > $iqr) $return['max'] = $return['q3'] + $iqr; 

    return $return; 
} 
+0

改進非常歡迎 – Lilleman

+1

這真是太好了! –

+0

+1優秀的解決方案。 – Johnny

1

利勒曼的代碼是輝煌的。我真的很感激他處理中位數和q1/q3的方式。如果我先回答這個問題,我會以一種更難但不必要的方式應對奇數和偶數的價值觀。我的意思是如果4次使用4種不同的模式情況(計數(值),4)。但他的方式簡潔而整齊。我很欣賞他的作品。

我想對max,min,higher_outliers和lower_outliers做一些改進。因爲q1-1.5 * IQR只是下限,所以我們應該找到大於這個界限的最小值作爲'min'。這是'最大'相同。此外,可能有多個異常值。所以我想根據利勒曼的工作做一些改變。謝謝。

function box_plot_values($array) 
{ 
    $return = array(
    'lower_outlier' => 0, 
    'min'   => 0, 
    'q1'    => 0, 
    'median'   => 0, 
    'q3'    => 0, 
    'max'   => 0, 
    'higher_outlier' => 0, 
); 

$array_count = count($array); 
sort($array, SORT_NUMERIC); 

$return['min']   = $array[0]; 
$return['lower_outlier'] = array(); 
$return['max']   = $array[$array_count - 1]; 
$return['higher_outlier'] = array(); 
$middle_index    = floor($array_count/2); 
$return['median']   = $array[$middle_index]; // Assume an odd # of items 
$lower_values    = array(); 
$higher_values   = array(); 

// If we have an even number of values, we need some special rules 
if ($array_count % 2 == 0) 
{ 
    // Handle the even case by averaging the middle 2 items 
    $return['median'] = round(($return['median'] + $array[$middle_index - 1])/2); 

    foreach ($array as $idx => $value) 
    { 
     if ($idx < ($middle_index - 1)) $lower_values[] = $value; // We need to remove both of the values we used for the median from the lower values 
     elseif ($idx > $middle_index) $higher_values[] = $value; 
    } 
} 
else 
{ 
    foreach ($array as $idx => $value) 
    { 
     if ($idx < $middle_index)  $lower_values[] = $value; 
     elseif ($idx > $middle_index) $higher_values[] = $value; 
    } 
} 

$lower_values_count = count($lower_values); 
$lower_middle_index = floor($lower_values_count/2); 
$return['q1']  = $lower_values[$lower_middle_index]; 
if ($lower_values_count % 2 == 0) 
    $return['q1'] = round(($return['q1'] + $lower_values[$lower_middle_index - 1])/2); 

$higher_values_count = count($higher_values); 
$higher_middle_index = floor($higher_values_count/2); 
$return['q3']  = $higher_values[$higher_middle_index]; 
if ($higher_values_count % 2 == 0) 
    $return['q3'] = round(($return['q3'] + $higher_values[$higher_middle_index - 1])/2); 

// Check if min and max should be capped 
$iqr = $return['q3'] - $return['q1']; // Calculate the Inner Quartile Range (iqr) 

$return['min'] = $return['q1'] - 1.5*$iqr; // This (q1 - 1.5*IQR) is actually the lower bound, 
              // We must compare every value in the lower half to this. 
              // Those less than the bound are outliers, whereas 
              // The least one that greater than this bound is the 'min' 
              // for the boxplot. 
foreach($lower_values as $idx => $value) 
{ 
    if($value < $return['min']) // when values are less than the bound 
    { 
     $return['lower_outlier'][$idx] = $value ; // keep the index here seems unnecessary 
                // but those who are interested in which values are outliers 
                // can take advantage of this and asort to identify the outliers 
    }else 
    { 
     $return['min'] = $value; // when values that greater than the bound 
     break; // we should break the loop to keep the 'min' as the least that greater than the bound 
    } 
} 

$return['max'] = $return['q3'] + 1.5*$iqr; // This (q3 + 1.5*IQR) is the same as previous. 
foreach(array_reverse($higher_values) as $idx => $value) 
{ 
    if($value > $return['max']) 
    { 
     $return['higher_outlier'][$idx] = $value ; 
    }else 
    { 
     $return['max'] = $value; 
     break; 
    } 
} 
    return $return; 
} 

我希望這可能有助於那些誰會對這個問題感興趣。如果有更好的方法來知道哪些值是異常值,請給我加評論。謝謝!

0

我有一個不同的解決方案來計算較低和較高的鬍鬚。與ShaoE的解決方案一樣,它發現最小值大於或等於下限(Q1 - 1.5 * IQR),反之亦然。

我使用array_filter迭代數組,將值傳遞給回調函數,並返回一個只有回調值爲true的值的數組(請參閱php.net's array_filter manual)。在這種情況下,返回大於下限的值並將其用作min的輸入,其本身返回最小值。

// get lower whisker 
$whiskerMin = min(array_filter($array, function($value) use($quartile1, $iqr) { 
     return $value >= $quartile1 - 1.5 * $iqr; 
    })); 
// get higher whisker vice versa 
$whiskerMax = max(array_filter($array, function($value) use($quartile3, $iqr) { 
     return $value <= $quartile3 + 1.5 * $iqr; 
    })); 

請注意,它忽略了異常值,我只用正值對其進行了測試。