2012-03-13 48 views
2

我有一個查詢來獲取前「n」誰在特定關鍵字的評論的用戶,前「n」結果爲每個關鍵字

SELECT `user` , COUNT(*) AS magnitude 
FROM `results` 
WHERE `keyword` = "economy" 
GROUP BY `user` 
ORDER BY magnitude DESC 
LIMIT 5 

我有大約6000關鍵字,並想運行此查詢讓我成爲我們擁有數據的每個關鍵字的頂級'n'用戶。援助感謝。

+0

你面對的是什麼錯誤? – Teja 2012-03-13 20:59:28

回答

3

因爲你沒有給該架構results,我會認爲它的這個或非常相似(可能額外列):

create table results (
    id int primary key, 
    user int, 
    foreign key (user) references <some_other_table>(id), 
    keyword varchar(<30>) 
); 

第1步:的總量,除以keyword/user在你的榜樣查詢,但所有關鍵字:

create view user_keyword as (
    select 
    keyword, 
    user, 
    count(*) as magnitude 
    from results 
    group by keyword, user 
); 

步驟2:秩每個關鍵字組內的每個用戶(請注意使用子查詢的排名的行):

create view keyword_user_ranked as (
    select 
    keyword, 
    user, 
    magnitude, 
    (select count(*) 
    from user_keyword 
    where l.keyword = keyword and magnitude >= l.magnitude 
    ) as rank 
    from 
    user_keyword l 
); 

步驟3:在秩是只選擇行小於某一數目:

select * 
from keyword_user_ranked 
where rank <= 3; 

實施例:

基礎數據使用:

mysql> select * from results; 
+----+------+---------+ 
| id | user | keyword | 
+----+------+---------+ 
| 1 | 1 | mysql | 
| 2 | 1 | mysql | 
| 3 | 2 | mysql | 
| 4 | 1 | query | 
| 5 | 2 | query | 
| 6 | 2 | query | 
| 7 | 2 | query | 
| 8 | 1 | table | 
| 9 | 2 | table | 
| 10 | 1 | table | 
| 11 | 3 | table | 
| 12 | 3 | mysql | 
| 13 | 3 | query | 
| 14 | 2 | mysql | 
| 15 | 1 | mysql | 
| 16 | 1 | mysql | 
| 17 | 3 | query | 
| 18 | 4 | mysql | 
| 19 | 4 | mysql | 
| 20 | 5 | mysql | 
+----+------+---------+ 

分組由關鍵字和用戶:

mysql> select * from user_keyword order by keyword, magnitude desc; 
+---------+------+-----------+ 
| keyword | user | magnitude | 
+---------+------+-----------+ 
| mysql | 1 |   4 | 
| mysql | 2 |   2 | 
| mysql | 4 |   2 | 
| mysql | 3 |   1 | 
| mysql | 5 |   1 | 
| query | 2 |   3 | 
| query | 3 |   2 | 
| query | 1 |   1 | 
| table | 1 |   2 | 
| table | 2 |   1 | 
| table | 3 |   1 | 
+---------+------+-----------+ 

用戶關鍵字內排名:

mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc; 
+---------+------+-----------+------+ 
| keyword | user | magnitude | rank | 
+---------+------+-----------+------+ 
| mysql | 1 |   4 | 1 | 
| query | 2 |   3 | 1 | 
| query | 3 |   2 | 2 | 
| table | 1 |   2 | 1 | 
+---------+------+-----------+------+ 

mysql> select * from keyword_user_ranked order by keyword, rank asc; 
+---------+------+-----------+------+ 
| keyword | user | magnitude | rank | 
+---------+------+-----------+------+ 
| mysql | 1 |   4 | 1 | 
| mysql | 2 |   2 | 3 | 
| mysql | 4 |   2 | 3 | 
| mysql | 3 |   1 | 5 | 
| mysql | 5 |   1 | 5 | 
| query | 2 |   3 | 1 | 
| query | 3 |   2 | 2 | 
| query | 1 |   1 | 3 | 
| table | 1 |   2 | 1 | 
| table | 3 |   1 | 3 | 
| table | 2 |   1 | 3 | 
+---------+------+-----------+------+ 

僅從每個關鍵字頂部2


注意,當有關係 - 看到用戶2和4的關鍵字「的MySQL」的例子 - 各方的配合獲得了「最後一個」級別,即如果第二和第三綁,兩者都是分配等級3.


性能:向關鍵字和用戶列添加索引將有所幫助。我用類似的方法查詢了一個表,兩列(在600000行表中)有4000和1300個不同的值。您可以添加像這樣的索引:

alter table results add index keyword_user (keyword, user); 

在我的情況下,查詢時間從大約6秒下降到大約2秒。

+0

Matt - 尼斯簡單的方法,但我認爲,在我的示例中,涉及的數據量很大,資源使用將會過高。非常感謝您的建議,雖然我有另一個創建過程,但從這種方法可以很好地工作。 – WAUS 2012-03-16 01:16:00

+0

@WAUS - 如果你能夠跟蹤並願意分享,我會對看到兩種方法之間的性能差異感興趣。 – 2012-03-16 13:54:15

0

您可以使用這樣的模式(從Within-group quotas (Top N per group)):

SELECT tmp.ID, tmp.entrydate 
FROM ( 
    SELECT 
    ID, entrydate, 
    IF(@prev <> ID, @rownum := 1, @rownum := @rownum+1) AS rank, 
    @prev := ID 
    FROM test t 
    JOIN (SELECT @rownum := NULL, @prev := 0) AS r 
    ORDER BY t.ID 
) AS tmp 
WHERE tmp.rank <= 2 
ORDER BY ID, entrydate; 
+------+------------+ 
| ID | entrydate | 
+------+------------+ 
| 1 | 2007-05-01 | 
| 1 | 2007-05-02 | 
| 2 | 2007-06-03 | 
| 2 | 2007-06-04 | 
| 3 | 2007-07-01 | 
| 3 | 2007-07-02 | 
+------+------------+ 
+0

Venk - 在我張貼的查詢本身沒有錯誤,我只是需要這個表中的不只是一個一個指定的關鍵字所有關鍵字投放。效率很重要,因爲這將返回'n'x 6000結果 – WAUS 2012-03-13 21:28:54

相關問題