2017-05-07 57 views
-1

我在構建文件下載統計數據庫和顯示信息時遇到了一些困難。如何獲取DISTINCT列和COUNT次SUB DISTINCT列的出現

表:customer_statistics

| user | product_id | file_download | date_accessed  | 
----------------------------------------------------------------- 
| tom | 1104  | file_1.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1010  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1077  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 1749  | file_2.pdf  | 2017-05-06 00:00:00 | 
| sue | 1284  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 1284  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 1065  | file_1.pdf  | 2017-05-06 00:00:00 | 
| sue | 1344  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 2504  | file_2.pdf  | 2017-05-06 00:00:00 | 

我需要顯示基於上面的表格如下:

湯姆下載file_3.pdf3不同的產品,但已下載file_3.pdfproduct_id 1048 4倍。

湯姆也從1產品下載file_1.pdf,並從product_id

湯姆只有一次從4不同的產品

蘇已下載file_3.pdf2不同產品共7下載,但已經下載file_3.pdfproduct_id 1284 2次。

蘇也從1產品下載file_1.pdf只有一次從product_id

蘇也從1產品從product_id

蘇下載file_2.pdf和只有一次從5不同共6下載產品

這樣做的最佳方法是什麼?

我需要重組我的表嗎?

感謝先進!

+0

你想要的結果看起來像*那*? – Strawberry

+0

@Strawberry - 當然不是,我只是想要這些價值觀 - 我這樣拼寫出來,所以很容易理解。 –

+2

那麼,你能拼出來嗎? – Strawberry

回答

1

請嘗試以下...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts; 

該語句使用以下子查詢來形成的userfile_downloadproduct_id獨特組合列表開始...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id 
FROM customer_statistics 
GROUP BY user, 
     file_download, 
     product_id 

上述子查詢的結果顯示在下面的子查詢中使用,以獲得該user已經從網上下載file多少product_id值的個性化......

SELECT user AS user, 
     file_download AS file_download, 
     COUNT(product_id) AS CountOfProducts 
FROM (SELECT user AS user, 
       file_download AS file_download, 
       product_id AS product_id 
     FROM customer_statistics 
     GROUP BY user, 
       file_download, 
       product_id 
    ) AS uniqueComboFinder 
GROUP BY user, 
     file_download 

產生的數據集然後在product_id值的的userfile_download每個組合的計數被有效地追加到每對應的記錄在customer_statistics這樣的方式連接到一個實例customer_statistics

從該接合產生的數據集,然後通過的userfile_downloadproduct_id每個唯一組合以及屬於各組(記錄的計數分組即,每個時間的計數,一個user已經下載一個特定fileproduct_id )被計算。

我不記得是否MySQL要求CountOfProductsGROUP BY使用。但是,儘管user,file_downloadproduct_id的每個組合都決定了CountOfProducts的值,但許多形式的SQL都要求您選擇每個非聚合字段的GROUP BY。因此,自從將CountOfProductsGROUP BY沒有任何傷害,我已經包括了GROUP BY子句中CountOfProducts

如果一個或兩個以上規則可以澄清關於它們的結構,則所顯示的句子可以被自動生成。

如果您有任何問題或意見,請隨時發佈相應評論。

附錄

要排除從結果集的單個用戶,請使用以下的變化。

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       WHERE user <> excludedUser 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts; 

我用excludedUser這裏,但你可以替換成一個恆定值(如Sam)或保存作爲目標的值的變量。

請注意,我已經加入了WHERE user <> excludedUser子句來最裏面的子查詢。由於其父級子查詢的結果完全基於最內層子查詢的結果,因此排除的用戶不會在父子查詢的重試中表示。並且,由於排除的User值未出現在父子查詢的結果中,因此當主語句的INNER JOIN部分基於User的共享值執行時,目標User也將從連接數據集中排除。

通過添加WHERE子句到最裏面的子查詢,我避免不必要的處理少量由語句的中間和外水平,從而使得整體語句略微比如果user值被排除在更高效的中層或外層。

同樣,如果需要排除多於一個User,您可以通過將它們的值User顯式編碼到語句中或通過連接到排除值表來排除它們。對於第一種情況使用...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       WHERE user NOT IN ('Sam', 'I', 'Am') 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts; 

在第二種情況下使用...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       WHERE user NOT IN (SELECT user 
            FROM excludedUsers 
           ) 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts; 
+0

絕對完美!我喜歡它。 –

+0

任何方式將其限制爲'WHERE user = sam'? –

+0

我將修改我的答案以允許您排除'1'或多於'1'的用戶。我只會幾分鐘。 – toonice

0

我給你一個提示讓你去。

開始有利於聚集的開溝單獨下載記錄,像這樣:

CREATE TEMPORARY TABLE IF NOT EXISTS basic_aggregated_stats 
SELECT user, file_id, product_id, COUNT(*) AS cnt 
    FROM customer_statistics 
    GROUP BY user, file_id, product_id; 

這只是一個步驟(其中,順便說一下,也可以作爲在更大更復雜的子查詢查詢)。您可以並且應該做更多的聚合來獲得您需要的信息。這不是「重組表格」。

除了更多的聚合,你需要考慮獲得正確的訂單和生產小計以及。