比較數據集並返回最佳匹配

在mysql中，我使用「連接表」將標籤分配給項目。我希望看到哪些商品與正在查看的商品具有最相似的代碼。比較數據集並返回最佳匹配

例如，假設感興趣的項目已被標記爲「酷」，「汽車」和「紅色」。我想用這些標籤搜索其他項目。我想查看已標記爲「汽車」的商品，但我希望標記爲「汽車」和「紅色」的商品位於僅標記爲「汽車」的商品之上。我希望具有相同標籤的項目位於結果的頂部。

是否有某種方式來比較使用另一個數據集（子查詢）數據集（子查詢）？或者，我可以使用一些技巧來使用GROUP BY和GROUP_CONCAT（）將它們評估爲逗號分隔的列表嗎？

2009-09-02 seans

這將有助於如果你告訴我們，你的表結構，這樣我就可以更具體。

我假設你已經有了類似這樣的結構：

Table item: (id, itemname) 
1 item1 
2 item2 
3 item3 
4 item4 
5 item5 

Table tag: (id, tagname) 
1 cool 
2 red 
3 car 

Table itemtag: (id, itemid, tagid) 
1 1 2 (=item1, red) 
2 2 1 (=item2, cool) 
3 2 3 (=item2, car) 
4 3 1 (=item3, cool) 
5 3 2 (=item3, red) 
6 3 3 (=item3, car) 
7 4 3 (=item3, car) 
8 5 3 (=item3, car)

一般我的做法是通過計算每個單獨的標籤開始了。

-- make a list of how often a tag was used: 
select tagid, count(*) as `tagscore` from itemtag group by tagid

這顯示了分配給該項目的每個標籤的一行。

在我們的例子，這將是：

tag tagscore 
1 2   (cool, 2x) 
2 2   (red, 2x) 
3 4   (car, 4x) 


set @ItemOfInterest=2; 

select 
    itemname, 
    sum(tagscore) as `totaltagscore`, 
    GROUP_CONCAT(tags) as `tags` 
from 
    itemtag 
join item on itemtag.itemid=item.id 

join 
    /* join the query from above (scores per tag) */ 
    (select tagid, count(*) as `tagscore` from itemtag group by tagid) as `TagScores` 
    on `TagScores`.tagid=itemtag.tagid 
where 
    itemid<>@ItemOfInterest and 
    /* get the taglist of the current item */ 
    tagid in (select distinct tagid from itemtag where [email protected]) 
group by 
    itemid 
order by 
    2 desc

說明：查詢有2子查詢：一是從感興趣的項目獲得該列表的標籤。我們只想和那些人一起工作。其他子查詢會爲每個標籤生成一個分數列表。

所以最終，數據庫中的每個項目都有標籤分數列表。這些分數加起來爲sum(tagscore)，這個數字用於排序結果（最高分）。

要顯示可用標籤的列表中，我使用GROUP_CONCAT。

查詢將導致這樣的事情（我做了實際的數據在這裏）：

Item TagsScore Tags 
item3 15   red,cool,car 
item4 7   red,car 
item5 7   red 
item1 5   car 
item6 5   car

來源

2009-09-02 23:42:13

這兩個回覆都處於正確的軌道，並讓我走向短期解決方案。就如何擴展這個例程而言，我仍在尋找！ – seans 2009-09-03 23:00:03

如何：

SELECT post, SUM(IF(tag IN ('cool', 'cars', 'red'), 1, 0)) AS number_matching 
FROM tags 
GROUP BY post 
ORDER BY number_matching DESC

這裏的術語列表可以從您的應用程序，如果您有它已經得心應手，或可以從一個子查詢生成填充到SQL。

來源

2009-09-02 23:00:49 VoteyDisciple

這將排序工作，但你必須動態地生成該查詢，因爲每個項目可以有一組不同的標籤。硬編碼列表可以替換爲子查詢來解決這個問題。 – 2009-09-02 23:44:05

這就是我的想法。編輯澄清。 – VoteyDisciple 2009-09-03 01:23:05

比較數據集並返回最佳匹配

回答

相關問題