我覺得你的描述中這樣的話聽起來很對。首先按數量選擇前3位,按項目分組並按數量降序排序。然後從該組中按數量升序排序選擇前1個。請記住,我不熟悉HiveSQL 100%,但這個SQL代碼應該是非常接近標準:
SELECT TOP 1 itemName
FROM (
SELECT TOP 3 itemName, COUNT(*) AS boughtCount
FROM MyTable
WHERE action = 'bought'
GROUP BY itemName
ORDER BY boughtCount DESC
)
ORDER BY boughtCount
編輯:按照註釋中的精度:
編輯2:這是測試在MSSQL中工作,可能需要調整一些HiveSQL的語法。
SELECT TOP 1 itemId
FROM (
-- Get the list of the top 3 items that have as many ItemsByUsers entries as distinct userIds
-- in the table, group by item and sort by sum of items bought descending.
SELECT TOP 3 itemId, SUM(boughtCount) AS totalBought
FROM (
-- Get a list of the most bought items by item and user
SELECT itemId, userId, COUNT(*) AS boughtCount
FROM MyTable
WHERE action = 'bought'
GROUP BY itemId, userId
) AS ItemCountByUser
GROUP BY itemId
HAVING COUNT(*) = (SELECT COUNT(*) FROM (SELECT DISTINCT userId FROM MyTable) AS UserCount)
ORDER BY totalBought DESC
) AS MostBought
ORDER BY totalBought
你如何定義項目的購買訂單購買了ITEMNAME?您是否想要確定每個用戶購買的所有商品中的「第三」,還是您想知道每個用戶是他們購買的「第三」商品? – collapsar
您需要另一個包含時間戳的字段。這可能是[這個問題]的副本(http://stackoverflow.com/questions/400712/how-to-do-equivalent-of-limit-distinct) – 4castle
@collapsar根據計數即第三最高計數每個人都買的物品 – user3396729