如果您的表看起來是這樣的:
SELECT * from categories;
+---------+----------+
| page_id | category |
+---------+----------+
| 1 | a |
| 1 | b |
| 1 | a |
| 1 | c |
| 1 | a |
| 1 | b |
| 1 | a |
| 2 | d |
| 2 | d |
| 2 | c |
| 2 | d |
| 3 | a |
| 3 | b |
| 3 | c |
| 4 | c |
| 4 | d |
| 4 | c |
+---------+----------+
17 rows in set (0.00 sec)
那麼你可能要嘗試此查詢:
SELECT c1.page_id, MAX(freq.total),
(
SELECT c2.category
FROM categories c2
WHERE c2.page_id = c1.page_id
GROUP BY c2.category
HAVING COUNT(*) = MAX(freq.total)
LIMIT 1
) AS category
FROM categories c1
JOIN (
SELECT page_id, category, count(*) total
FROM categories
GROUP BY page_id, category
) freq ON (freq.page_id = c1.page_id)
GROUP BY c1.page_id;
它返回這樣的:
+---------+-----------------+----------+
| page_id | MAX(freq.total) | category |
+---------+-----------------+----------+
| 1 | 4 | a |
| 2 | 3 | d |
| 3 | 1 | a |
| 4 | 2 | c |
+---------+-----------------+----------+
4 rows in set (0.00 sec)
比較結果與實際頻率分佈:
SELECT page_id, category, COUNT(*) FROM categories GROUP BY page_id, category;
+---------+----------+----------+
| page_id | category | COUNT(*) |
+---------+----------+----------+
| 1 | a | 4 |
| 1 | b | 2 |
| 1 | c | 1 |
| 2 | c | 1 |
| 2 | d | 3 |
| 3 | a | 1 |
| 3 | b | 1 |
| 3 | c | 1 |
| 4 | c | 2 |
| 4 | d | 1 |
+---------+----------+----------+
10 rows in set (0.00 sec)
請注意,對於page_id = 3
,沒有主導頻率,在這種情況下,此查詢無法保證在這種情況下將選擇哪個類別。
它可能有助於查看您正在使用的實際表結構。 – JYelton 2010-05-28 00:01:47