2009-09-10 60 views
7

如何獲取MySQL中每個標籤的最頻繁出現的類別?理想情況下,我想模擬一個聚合函數來計算列的modeMySQL按組最頻繁選擇

SELECT 
    t.tag 
    , s.category 
FROM tags t 
LEFT JOIN stuff s 
USING (id) 
ORDER BY tag; 

+------------------+----------+ 
| tag    | category | 
+------------------+----------+ 
| automotive  |  8 | 
| ba    |  8 | 
| bamboo   |  8 | 
| bamboo   |  8 | 
| bamboo   |  8 | 
| bamboo   |  8 | 
| bamboo   |  8 | 
| bamboo   |  10 | 
| bamboo   |  8 | 
| bamboo   |  9 | 
| bamboo   |  8 | 
| bamboo   |  10 | 
| bamboo   |  8 | 
| bamboo   |  9 | 
| bamboo   |  8 | 
| banana tree  |  8 | 
| banana tree  |  8 | 
| banana tree  |  8 | 
| banana tree  |  8 | 
| bath    |  9 | 
+-----------------------------+ 
+0

只要想到幾年後,更聰明的我提 - 不組織標籤這樣的,它是一個反模式。使用many2many表來定義標籤和項目之間的關係。也就是說,我仍然希望MySQL中有一個MODE聚合函數。 – 2012-02-12 17:31:52

回答

3
SELECT t1.* 
FROM (SELECT tag, category, COUNT(*) AS count 
     FROM tags INNER JOIN stuff USING (id) 
     GROUP BY tag, category) t1 
LEFT OUTER JOIN 
    (SELECT tag, category, COUNT(*) AS count 
     FROM tags INNER JOIN stuff USING (id) 
     GROUP BY tag, category) t2 
    ON (t1.tag = t2.tag AND (t1.count < t2.count 
     OR t1.count = t2.count AND t1.category < t2.category)) 
WHERE t2.tag IS NULL 
ORDER BY t1.count DESC; 

我同意這是種太多的單個SQL查詢。在子查詢中使用任何GROUP BY都可以讓我變得更好。你可以把它看起來簡單的使用意見:

CREATE VIEW count_per_category AS 
    SELECT tag, category, COUNT(*) AS count 
    FROM tags INNER JOIN stuff USING (id) 
    GROUP BY tag, category; 

SELECT t1.* 
FROM count_per_category t1 
LEFT OUTER JOIN count_per_category t2 
    ON (t1.tag = t2.tag AND (t1.count < t2.count 
     OR t1.count = t2.count AND t1.category < t2.category)) 
WHERE t2.tag IS NULL 
ORDER BY t1.count DESC; 

但它基本上是做幕後同樣的工作。

您評論說您可以在應用程序代碼中輕鬆執行類似的操作。那麼爲什麼你不這樣做呢?做更簡單的查詢以獲得每個類別的計數:

SELECT tag, category, COUNT(*) AS count 
FROM tags INNER JOIN stuff USING (id) 
GROUP BY tag, category; 

並對應用程序代碼中的結果進行排序。

+0

我一直在努力工作..它似乎會更好地做一個聚合函數MOST_FREQUENT()..我要去看看,如果這是我的技能水平在這裏... – 2009-09-11 15:09:19

+0

對不起,我誤解了你的模式。我仔細看了一下,嘲笑了一個測試數據庫,所以我可以確定查詢的工作原理。嘗試上面編輯的版本。 – 2009-09-11 15:48:49

+0

這似乎工作。雖然吞嚥有點困難..並且有兩個子選擇而不是一個。我希望只有一個內建的聚合函數MEAN()或其他:-P。我可以用5分鐘的時間寫C語言。 – 2009-09-11 16:03:55

2
SELECT tag, category 
FROM (
     SELECT @tag <> tag AS _new, 
       @tag := tag AS tag, 
       category, COUNT(*) AS cnt 
     FROM (
       SELECT @tag := '' 
       ) vars, 
       stuff 
     GROUP BY 
       tag, category 
     ORDER BY 
       tag, cnt DESC 
     ) q 
WHERE _new 

上的數據,這將返回以下:

'automotive', 8 
'ba',   8 
'bamboo',  8 
'bananatree', 8 
'bath',  9 

這裏的測試腳本:

CREATE TABLE stuff (tag VARCHAR(20) NOT NULL, category INT NOT NULL); 

INSERT 
INTO stuff 
VALUES 
('automotive',8), 
('ba',8), 
('bamboo',8), 
('bamboo',8), 
('bamboo',8), 
('bamboo',8), 
('bamboo',8), 
('bamboo',10), 
('bamboo',8), 
('bamboo',9), 
('bamboo',8), 
('bamboo',10), 
('bamboo',8), 
('bamboo',9), 
('bamboo',8), 
('bananatree',8), 
('bananatree',8), 
('bananatree',8), 
('bananatree',8), 
('bath',9); 
3

(編輯:忘了DESC在ORDER BYS)

易於在子查詢中使用LIMIT。 MySQL是否仍然具有無限制的子查詢限制?下面的例子是使用PostgreSQL。

=> select tag, (select category from stuff z where z.tag = s.tag group by tag, category order by count(*) DESC limit 1) AS category, (select count(*) from stuff z where z.tag = s.tag group by tag, category order by count(*) DESC limit 1) AS num_items from stuff s group by tag; 
    tag  | category | num_items 
------------+----------+----------- 
ba   |  8 |   1 
automotive |  8 |   1 
bananatree |  8 |   4 
bath  |  9 |   1 
bamboo  |  8 |   9 
(5 rows) 

僅當需要計數時才需要第三列。

1

這是比較簡單的情況:

SELECT action, COUNT(action) AS ActionCount FROM log GROUP BY action ORDER BY ActionCount DESC;