2015-10-14 252 views
0

我正在研究PostgreSQL中的一個巨大的數據庫。 (對不起,如果這是不正確的編輯,我已經試了幾個小時,仍然在努力)按值組分組

這是用於我的查詢:(表user_activities)與一些示例數據的結構的一部分。

+---------------------+---------------------+---------------------+ 
| user_id    | activity   | operation   | 
+---------------------+---------------------+---------------------+ 
| 1     | 1     | 1     | 
| 1     | 1     | 1     | 
| 1     | 1     | 1     | 
| 2     | 1     | 2     | 
| 2     | 1     | 3     | 
| 3     | 1     | 3     | 
| 4     | 1     | 4     | 
| 4     | 1     | 4     | 
| 5     | 1     | 4     | 
| 5     | 1     | 5     | 
| 6     | 3     | 1     | 
| 6     | 3     | 1     | 
| 6     | 3     | 2     | 
| 7     | 3     | 3     | 
| 8     | 3     | 4     | 
| 8     | 3     | 5     | 
+---------------------+---------------------+---------------------+ 

,這是我想要的輸出:

+---------------------+---------------------+---------------------+ 
| count(user_id)  | activity   | operation   | 
+---------------------+---------------------+---------------------+ 
| 4     | 1     | 1,2     | 
| 6     | 1     | 3,4,5    | 
| 6     | 3     | 1,2,3,4,5   | 
+---------------------+---------------------+---------------------+ 

我需要統計USER_ID運營值的每個活動和組。所以我需要在活動爲1或3時按活動進行分組(已完成WHERE activity IN (1,3))。但我也需要按操作分組。問題是每一組操作都會有超過1個值。操作可以是1,2,3,4和5.我想連接1,2的組和3,4,5的組。但是,這並不是全部...

如果我按手術分組,那麼每個活動都會有5組。我需要爲活動1(已指定組)設置2個組,並且只有一個組具有所有操作值(如果活動爲3)。

這可能嗎?

編輯: 我現在無法檢查答案,我希望明天能夠。因此,我會爲我的投票和答覆提供答案,謝謝你的幫助。

+1

我認爲你應該編輯你的問題,並提供樣本數據和期望的結果(說明你在找什麼)以及你現在有的查詢(幫助其他人編寫查詢)。 –

+0

@GordonLinoff好吧,給我一分鐘,編輯 – AleOtero93

+0

你可以使用tablefunc擴展嗎? – dtelaroli

回答

1

SQL Fiddle Demo

只需用一個例子給你想要的組放在一起。

WITH cte as (
    SELECT "user_id", "activity", "operation", 
     CASE 
      WHEN "activity" = 1 THEN 
        CASE 
         WHEN "operation" IN (1,2) THEN '1_first'   
         ELSE '1_second' 
        END 
      WHEN "activity" = 3 THEN '3_first' 
     END as "op_group" 
    FROM user_activities 
) 
SELECT "activity", 
     "op_group", 
     count("user_id"), 
     array_agg(distinct "operation") as "operation" 
FROM cte 
GROUP BY "activity", "op_group" 

輸出

| activity | op_group | count | operation | 
|----------|----------|-------|-----------| 
|  1 | 1_first |  4 |  1,2 | 
|  1 | 1_second |  6 |  3,4,5 | 
|  3 | 3_first |  6 | 1,2,3,4,5 | 
+1

我修復了我的回答 –

+0

我有一個問題......是否可以使用WHEN操作IN 1,2)'而不是'當「操作」= 1或「操作」= 2'? – AleOtero93

+0

是的,我更新我的答案和小提琴那個變化 –

2

更新了您的詳細規格:

SELECT COUNT(*) as cnt, ua.activity, array_agg(distinct ua.operation) 
FROM users ua 
JOIN (
    SELECT 1 AS activity, 1 as operation, 1 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 2 as operation, 1 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 3 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 4 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 5 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 1 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 2 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 3 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 4 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 5 as operation, 3 as GROUP_CODE 
) c 
ON ua.activity = c.activity and ua.operation = c.operation 
GROUP BY c.GROUP_CODE, ua.activity 

http://sqlfiddle.com/#!15/46e1f/15


原始回答

這是我該怎麼做的,下面我動態創建邏輯表,但你也可以在你的數據庫中有表並加入它。

SELECT GROUP_CODE, COUNT(*) as cnt 
FROM user_activities ua 
JOIN (
    SELECT 1 AS activity, 1 as operation, 1 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 2 as operation, 1 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 3 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 4 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 5 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 1 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 2 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 3 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 4 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 5 as operation, 3 as GROUP_CODE 
) c 
ON ua.activity = c.activity and ua.operation = c.operation 
GROUP BY GROUP_CODE 

這應該是相當快 - 還記得SQL設計有集(表)的工作,並加入 - 這個使用加入到執行的邏輯。這也很好,因爲如果你把它作爲一個表,你可以通過改變表來改變邏輯,或者如果你添加另一列來選擇,然後在查詢運行時選擇使用哪一個,就可以在表中存儲多個「邏輯」 。

我已經使用類似的方法在動態用戶界面中進行加權和個性化排序。

2

從我的理解,像這樣的查詢會幫助你。在的問題和意見信息搞糊塗了一點點,所以我用我最好的判斷提供瞭解決方案

create table test (user_id int, activity int, operation int); 
insert into test values (1,1,1), (1,1,1), (1,1,2), (2,1,3), (2,1,4), (3,3,1), (4,3,3), (4,3,5); 

select count(*), activity, array_agg(operation) 
from test 
group by activity, user_id 

Result: 
| count | activity | array_agg | 
| 3  | 1  | {1,1,2} | 
| 2  | 1  | {3,4}  | 
| 1  | 3  | {1}  | 
| 2  | 3  | {3,5}  | 

基於編輯的問題,我覺得這是我想解決這個問題:

表:

create table test (user_id int, activity int, operation int); 
insert into test values 
(1,1,1),(1,1,1),(1,1,1), 
(2,1,2),(2,1,3), 
(3,1,3), 
(4,1,4),(4,1,4), 
(5,1,4),(5,1,5), 
(6,3,1),(6,3,1),(6,3,2), 
(7,3,3), 
(8,3,4),(8,3,5); 

查詢:

select count(*), activity, string_agg(distinct operation::VARCHAR, ',') 
from test 
where operation in (1,2) and activity = 1 
group by activity 

UNION ALL 

select count(*), activity, string_agg(distinct operation::VARCHAR, ',') 
from test 
where operation in (3,4,5) and activity = 1 
group by activity 

UNION ALL 

select count(*), activity, string_agg(distinct operation::VARCHAR, ',') 
from test 
where activity = 3 
group by activity 

結果

count | activity | string_agg 
4  | 1  | 1,2 
6  | 1  | 3,4,5 
6  | 3  | 1,2,3,4,5 
+0

這是錯誤的。如果按活動和用戶標識進行分組,則每分鐘用戶將得到一個唯一的行。有8個用戶ID,是的你很困惑。 – Hogan

+0

當我上次看到它時,@Hogan沒有8個用戶標識。我將在今天晚些時候嘗試改進答案。 – zedfoxus

+0

對不起,我的壞消息......我實際上需要一些睡眠。所以我的壞 – AleOtero93