SQL-選擇最相似的產品

好吧，我有一個關係，它存儲兩個鍵，一個產品Id和一個屬性Id。我想知道哪種產品與給定產品最相似。（屬性實際上是數字，但它使例子更加混亂，使他們已被更改爲字母簡化視覺表現。）SQL-選擇最相似的產品

Prod_att

Product | Attributes 
    1 | A  
    1 | B 
    1 | C 
    2 | A 
    2 | B 
    2 | D 
    3 | A 
    3 | E 
    4 | A

最初，這似乎相當簡單，只需選擇屬性，一個產品已經計算出每個產品共享的屬性數量。然後將結果與產品的屬性數量進行比較，我可以看到兩種產品的相似程度。這適用於具有相對於其比較產品的大量屬性的產品，但是當產品具有非常少的屬性時會出現問題。例如，產品3幾乎可以與所有其他產品配合（因爲A很常見）。

SELECT Product, count(Attributes) 
FROM Prod_att 
WHERE Attributes IN 
(SELECT Attributes 
FROM prod_att 
WHERE Product = 1) 
GROUP BY Product 
;

有關如何解決此問題或改進我當前查詢的任何建議？
謝謝！

*編輯：產品4將返回count（）= 1的所有產品。我想展示產品3更加相似，因爲它具有較少的不同屬性。

來源

2013-05-08 Crp

如何定義的最小集合類似的屬性？這可以通過使用'HAVING'子句來實現。 – 2013-05-08 16:53:50

http://stackoverflow.com/questions/384276/how-to-create-search-engines-like-google – 2013-05-08 16:54:12

什麼[RDBMS]（http：//en.wikipedia。org/wiki/Relational_database_management_system）您正在使用？ 'RDBMS'代表*關係數據庫管理系統*。 'RDBMS是SQL'的基礎，並且適用於所有現代數據庫系統，如MS SQL Server，IBM DB2，Oracle，MySQL等...... 您是否也可以提供您想要的結果的樣本記錄？ – 2013-05-08 17:06:22

試試這個

SELECT 
    a_product_id, 
    COALESCE(b_product_id, 'no_matchs_found') AS closest_product_match 
FROM (
    SELECT 
    *, 
    @row_num := IF(@prev_value=A_product_id,@row_num+1,1) AS row_num, 
    @prev_value := a_product_id 
    FROM 
    (SELECT @prev_value := 0) r 
    JOIN (
     SELECT 
     a.product_id as a_product_id, 
     b.product_id as b_product_id, 
     count(distinct b.Attributes), 
     count(distinct b2.Attributes) as total_products 
     FROM 
      products a 
      LEFT JOIN products b ON (a.Attributes = b.Attributes AND a.product_id <> b.product_id) 
      LEFT JOIN products b2 ON (b2.product_id = b.product_id) 
     /*WHERE */ 
     /* a.product_id = 3 */ 
     GROUP BY 
     a.product_id, 
     b.product_id 
     ORDER BY 
      1, 3 desc, 4 
) t 
) t2 
WHERE 
    row_num = 1

以上query得到closest matches的所有產品，您可以在最裏面的查詢product_id，得到的結果對於特定的product_id，我已經使用LEFT JOIN以便即使product沒有匹配，它的顯示

SQLFIDDLE

希望這有助於

來源

2013-05-08 18:38:25 Akash

很棒！比僅比較匹配屬性複雜得多。謝謝。 – Crp 2013-05-09 05:46:04

很高興知道它的幫助:) – Akash 2013-05-09 06:18:16

嘗試"Lower bound of Wilson score confidence interval for a Bernoulli parameter"。當你有小n時，這明確地處理了統計信心的問題。它看起來像很多數學，但實際上這是關於你需要做這種事情的最低數學量。網站解釋得很好。

這假定可以從正面/負面評分到匹配/不匹配屬性的問題。

這裏有一個正面和負面的得分和95％CL的例子：

SELECT widget_id, ((positive + 1.9208)/(positive + negative) - 
1.96 * SQRT((positive * negative)/(positive + negative) + 0.9604)/
(positive + negative))/(1 + 3.8416/(positive + negative)) 
AS ci_lower_bound FROM widgets WHERE positive + negative > 0 
ORDER BY ci_lower_bound DESC;

來源

2013-05-08 17:24:43 criticalfix

你可以寫一點點看法，會給你兩種產品的總共享的屬性。

create view vw_shared_attributes as 
select a.product, 
     b.product 'product_match', 
     count(*) 'shared_attributes' 
from your_table a 
    inner join test b on b.attribute = a.attribute and b.product <> a.product 
group by a.product, b.product

然後使用該視圖選擇熱門匹配。

select product, 
     (select top 1 s.product_match from vw_shared_attributes s where t.product = s.product order by s.shared_attributes desc) 
    from your_table t 
    group by product

爲例見http://www.sqlfiddle.com/#!6/53039/1

來源

2013-05-08 17:35:02 Nate

SQL-選擇最相似的產品

回答

相關問題