2011-12-23 19 views
0

假設我有兩個表,peoplefamiliesPHP/MySQL - 分析多套公共集合

families有兩個字段 - idnamename字段包含家族姓氏。

people有三個字段 - idfamily_idname - 的family_id是家庭的id那個人所屬。 name字段是該人的名字。

它基本上是與一個有很多人的家庭的一對多關係。

我想要得到一個名稱集列表,按家族中最大名稱集的最高出現次序排序。

這可能沒有多大意義......

爲了解釋什麼,我想進一步,我們可以得分每組名字。 '分數'是數組大小*家族中出現的次數。

例如,假設兩個人的名字,「約翰」和「簡」無論是在三個家庭存在 - 那一套的「分數」將是2 * 3 = 6

我怎麼能拿套陣列名稱和套件的「分數」,按照每組的分數排序?示例結果集(我把它放在表格佈局中,但這可能是PHP中的多維數組) - 注意,這只是隨機思考,並不反映任何統計名稱數據。

names    | occurrences | score 
Ben, Lucy   | 4   | 8 
Jane, John   | 3   | 6 
James, Rosie, Jack | 2   | 6 
Charlie, Jane  | 2   | 4 

只是爲了澄清,我沒有興趣在集合,其中:

  • 出現的次數爲1(顯然,只有一個家庭)。
  • 設置的大小是1(只是一個普通的名字)。

我希望我已經解釋了我有點複雜的問題 - 如果有人需要澄清請說。

+0

整個工作集是否適合RAM? I.e .:我可以在PHP中完整地構建數組嗎? – 2011-12-23 00:27:43

+0

@EugenRieck是的,它會這樣做 – 2011-12-23 00:31:53

回答

1

好的,知道了:

<?php 
require_once('query.lib.php'); 

$db=new database(DB_TYPE,DB_HOST,DB_USER,DB_PASS,DB_MISC); 
$qry=new query('set names utf8',$db); 

//Base query, this filters out names that are in just one family 
$sql='select name, cast(group_concat(family order by family) as char) as famlist, count(*) as num from people group by name having num>0 order by num desc'; 
$qry=new query($sql,$db); 

//$qry->result is something like 
/* 
Array 
(
    [name] => Array 
     (
      [0] => cathy 
      [1] => george 
      [2] => jack 
      [3] => john 
      [4] => jane 
      [5] => winston 
      [6] => peter 
     ) 

    [famlist] => Array 
     (
      [0] => 2,4,5,6,8 
      [1] => 2,3,4,5,8 
      [2] => 1,3,5,7,8 
      [3] => 1,2,3,6,7 
      [4] => 2,4,7,8 
      [5] => 1,2,6,8 
      [6] => 1,3,6 
     ) 

    [num] => Array 
     (
      [0] => 5 
      [1] => 5 
      [2] => 5 
      [3] => 5 
      [4] => 4 
      [5] => 4 
      [6] => 3 
     ) 

) 

$qry->rows=7 
*/ 

//Initialize 
$names=$qry->result['name']; 
$rows=$qry->rows; 
$lists=array(); 
for ($i=0;$i<$rows;$i++) $lists[$i]=explode(',',$qry->result['famlist'][$i]); 

//Walk the list and populate pairs - this filters out pairs, that are specific to only one family 
$tuples=array(); 
for ($i=0;$i<$rows;$i++) { 
    for ($j=$i+1;$j<$rows;$j++) { 
    $isec=array_intersect($lists[$i],$lists[$j]); 
    if (sizeof($isec)>1) { 
     //Every tuple consists of the name-list, the family list, the length and the latest used name 
     $tuples[]=array($names[$i].'/'.$names[$j],$isec,2,$j); 
    } 
    } 
} 

//Now walk the tuples again rolling forward, until there is nothing left to do 
//We do not use a for loop just for style 
$i=0; 
while ($i<sizeof($tuples)) { 
    $tuple=$tuples[$i]; 
    //Try to combine this tuple with all later names 
    for ($j=$tuple[3]+1;$j<$rows;$j++) { 
    $isec=array_intersect($tuple[1],$lists[$j]); 
    if (sizeof($isec)>0) $tuples[]=array($tuple[0].'/'.$names[$j],$isec,$tuple[2]+1,$j); 
    } 
    $i++; 
} 

//We have all the tuples, now we just need to extract the info and prepare to sort - some dirty trick here! 
$final=array(); 
while (sizeof($tuples)>0) { 
    $tuple=array_pop($tuples); 
    //name list is in $tuple[0] 
    $list=$tuple[0]; 
    //count is sizeof($tuple[1]) 
    $count=sizeof($tuple[1]); 
    //length is in $tuple[2] 
    $final[]=$tuple[2]*$count."\t$count\t$list"; 
} 

//Sorting and output is all that is left 
rsort($final); 
print_r($final); 
?> 

我很抱歉,我才意識到我使用查詢LIB,我不能源在這裏,但在徵求意見,您將能夠輕鬆地創建數組作爲在「初始化」一節中。

基本上我所做的就是從配對開始我保留當前名稱列表中所有名稱所屬的家族數組,然後與所有尚未嘗試的名稱相交。

0

這項工作?

SELECT 
    f.name AS 'surname', 
    GROUP_CONCAT(DISTINCT p.name ORDER BY p.name) AS 'names', 
    COUNT(DISTINCT p.name) AS 'distinct_names', 
    COUNT(p.id) AS 'occurrences', 
    COUNT(DISTINCT p.name) * COUNT(p.id) AS 'score' 
FROM 
    families f 
    LEFT JOIN people p ON (f.id = p.family_id) 
GROUP BY 
    f.id 
ORDER BY 
    f.name 
+0

不...在我的完整數據集中,頂級名稱只出現3次,但是您的查詢表明有一組6個名稱,這些名稱在不同的家族中都出現過6次! – 2011-12-23 03:35:54