2017-02-09 73 views
-1

形勢PostgreSQL的 - 只選擇1行每個ID

我工作的一個旅遊引擎網站,寫一個複雜的查詢,以匹配基於IP地址與他們預約的訪客的搜索查詢目的地日期所以我可以稍後計算轉換比率。

問題

需要有基於參數的多個轉化率(在這種情況下,utm_source我從RequestUrl存儲在搜索表中提取)。問題是有些用戶從不同的位置進行多次搜索。有時我們會在請求中獲得utm_source,有時候不會......並且當然我們只需要匹配一次預訂。參見查詢結果的截圖如下,以更好地理解:

enter image description here

見第3和第4行具有爲列相同的預訂ID等。但不同的值。我只需要選擇其中的一種,但不能同時選擇兩種。基本上,如果超過1,我需要選擇不是「N/A」的1。

我的查詢:

SELECT DISTINCT "B"."Id" AS "BookingId", "PQ"."IPAddress", "PQ"."To", "PQ"."SearchDate", "PQ"."Value" 
FROM 
(
    SELECT DISTINCT "IPAddress", "To", "CreatedAt"::date AS "SearchDate", COALESCE(SUBSTRING("RequestUrl", 'utm_source=([^&]*)'), 'N/A') AS "Value" 
    FROM dbo."PackageQueries" 
    WHERE "SiteId" = '<The ID>' 
    AND "CreatedAt" >= '<Start Date>' 
    AND "CreatedAt" < '<End Date>' 
) AS "PQ" 
INNER JOIN dbo."Bookings" AS "B" 
    ON "PQ"."IPAddress" = "B"."IPAddress" 
    AND "B"."To" = "PQ"."To" 
    AND "B"."BookingDate"::date = "PQ"."SearchDate" 
WHERE "B"."SiteId" = '<The ID>' 
AND "B"."BookingStatus" = 2 
AND "B"."BookingDate" >= '<Start Date>' 
AND "B"."BookingDate" < '<End Date>' 
ORDER BY "B"."Id", "PQ"."IPAddress", "PQ"."To"; 
+1

http://stackoverflow.com/questions/tagged/postgresql+greatest-n-per-group –

+0

@a_horse_with_no_name,謝謝你的鏈接..並沒有這麼多的downvote 。 :-D。這比那些情況稍微複雜一些。首先,我不能僅僅通過一些可用的整數或日期/時間值來排序,因此我認爲它不值得投票表決,但這樣做是可以的。我找到了一個解決方案,我會在一會兒發佈自己的答案... – Matt

+0

我沒有downvote –

回答

0

我找到了解決辦法,並根據它什麼我發現這裏:Return rows that are max of one column in Postgresql這裏:Postgres CASE in ORDER BY using an alias

我的解決方案如下:

SELECT "BookingId", "IPAddress", "To", "SearchDate", "Value" 
FROM 
(
    SELECT DISTINCT 
     "B"."Id" AS "BookingId", 
     "PQ"."IPAddress", 
     "PQ"."To", 
     "PQ"."SearchDate", 
     "PQ"."Value", 
     RANK() OVER 
     (
      PARTITION BY "B"."Id" 
      ORDER BY 
      CASE 
       WHEN "PQ"."Value" = 'N/A' THEN 1 
       ELSE 0 
      END 
     ) AS "RowNumber" 
    FROM 
    (
     SELECT DISTINCT "IPAddress", "To", "CreatedAt"::date AS "SearchDate", COALESCE(SUBSTRING("RequestUrl", 'utm_source=([^&]*)'), 'N/A') AS "Value" 
     FROM dbo."PackageQueries" 
     WHERE "SiteId" = '<Site ID>' 
     AND "CreatedAt" >= '<Start Date>' 
     AND "CreatedAt" < '<End Date>' 
    ) AS "PQ" 
    INNER JOIN dbo."Bookings" AS "B" 
     ON "PQ"."IPAddress" = "B"."IPAddress" 
     AND "B"."To" = "PQ"."To" 
     AND "B"."BookingDate"::date = "PQ"."SearchDate" 
    WHERE "B"."SiteId" = '<Site ID>' 
    AND "B"."BookingStatus" = 2 
    AND "B"."BookingDate" >= '<Start Date>' 
    AND "B"."BookingDate" < '<End Date>' 
) T 
WHERE "RowNumber" = 1 
ORDER BY "BookingId", "IPAddress", "To"; 

有點囉嗦,但它很好地訣竅。我希望它能幫助別人。

編輯

這不是故事的結局:仍有一些案件中,我得到超過1倍的值。答案是修改CASE語句,爲每個文本值生成一個唯一的編號。該解決方案可以在這裏找到:PostgreSQL - Assign integer value to string in case statement