2016-12-05 65 views
0

說我有兩張桌子。 businessesreviews爲企業。Mysql貝葉斯和按星級評分

businesses表:

+----+-------+ 
| id | title | 
+----+-------+ 

reviews表:

+----+-------------+---------+------+ 
| id | business_id | message | rate | 
+----+-------------+---------+------+ 

每個評論有一個rate(1到5星)
我想他們的評論率對企業進行排序,根據Bayesian Ranking條件至少有2條評論。

這裏是我的查詢:

SELECT b.id, 
(SELECT COUNT(r.rate) as rr FROM reviews r WHERE r.business_id = b.id) as rr, 
(SELECT 
     ((COUNT(r.rate)/(COUNT(r.rate) + 2)) AVG(r.rate) + 
     (2 /(COUNT(r.rate) + 2)) 4) 
    FROM reviews r where r.business_id = b.id AND rr > 2 
) as score 
FROM businesses b 
order by score desc 
LIMIT 4 

這將輸出我:

+------+----+------------+ 
| id | rr | score  | 
+------+----+------------+ 
| 992 | 14 | 4.31250000 | 
+------+----+------------+ 
| 237 | 3 | 4.2000000 | 
+------+----+------------+ 
| 19 | 5 | 4.0000000 | 
+------+----+------------+ 
| 1009 | 12 | 3.9285142 | 
+------+----+------------+ 

我有兩個問題:

  1. 當你看到在((COUNT(r.rate)/(COUNT(r.rate) + 2)) AVG(r.rate) + (2 /(COUNT(r.rate) + 2)) 4) FROM reviews r where r.business_id = b.id AND rr > 2)一些功能正在運行更多比一次,如COUNTAVG。他們是否在後臺運行一次,也許緩存resuslt?或運行每一個電話?

  2. 是否有任何等效查詢,但更優化?

在此先感謝。

+0

你甚至能得到'正確'的答案嗎?我認爲'rr'不應該對第二個子查詢可見。 –

回答

1

我希望MySQL能夠優化多重計數,但不能確定。

但是,您可以重新安排您查詢加入反對子查詢。這樣你不會爲每一行執行2個子查詢。

SELECT b.id, 
     sub0.rr, 
     sub0.score 
FROM businesses b 
INNER JOIN 
(
    SELECT r.business_id, 
      COUNT(r.rate) AS rr , 
      ((COUNT(r.rate)/(COUNT(r.rate) + 2)) AVG(r.rate) + (2 /(COUNT(r.rate) + 2)) 4) AS score 
    FROM reviews r 
    GROUP BY r.business_id 
    HAVING rr > 2 
) sub0 
ON sub0.business_id = b.id 
ORDER BY score DESC 
LIMIT 4 

注意,這裏的結果是非常略有不同,因爲它會排除只有2條評論記錄,而您的查詢仍然會返回,但他們的得分爲NULL。我已經離開了明顯缺少的運營商(即AVG(r.rate)之前和之前4)您的原始查詢AS評分

使用上面的想法,您可以重新編碼它以返回子查詢中的計數和平均速率,並僅使用那些返回的列的值來計算。

SELECT b.id, 
     sub0.rr, 
     ((rr/(rr + 2)) arr + (2 /(rr + 2)) 4) AS score 
FROM businesses b 
INNER JOIN 
(
    SELECT r.business_id, 
      COUNT(r.rate) AS rr , 
      AVG(r.rate) AS arr 
    FROM reviews r 
    GROUP BY r.business_id 
    HAVING rr > 2 
) sub0 
ON sub0.business_id = b.id 
ORDER BY score DESC 
LIMIT 4 
+0

謝謝你的回覆。我試圖運行你的第二個查詢,但它在第12行錯誤'b.id未知'在內部選擇的地方。所以我改變了這一點。 https://codetidy.com/9750/,但子查詢得到所有rr> 2 – Pars

+1

@Pars - 糟糕,固定。子查詢將得到計數大於2的所有數據,但是該數據將與業務表相連接,然後執行計算以計算得分。因此,對子查詢的連接不包括2個或更少的評論,而主查詢中的ORDER/LIMIT將限制爲4個返回的行。 – Kickstart