正如評論中提到的mathguy,這樣的查詢可以通過更改設計來簡化。
但是,考慮到這種格式的數據,仍然可以提取一組獨特的TAG
/GROUPID
。以下是一個示例方法:
在第一階段中,我們將使用REGEXP_COUNT
來查看每行有多少個TAG
。然後,我們將生成位置標記號爲每個行中的每個TAG
。最後,我們將在每行中的每個集合式TAG
的給定位置提取標籤。
首先,創建測試表:
CREATE TABLE GROUPID_TAG(
GROUPID NUMBER,
"TAG" VARCHAR2(256)
);
INSERT INTO GROUPID_TAG VALUES (1,'Tag1 Tag2');
INSERT INTO GROUPID_TAG VALUES (1,'Tag1 Tag3');
INSERT INTO GROUPID_TAG VALUES (1,'Tag1 Tag4');
INSERT INTO GROUPID_TAG VALUES (2,'Tag5 Tag6');
INSERT INTO GROUPID_TAG VALUES (2,'Tag4 Tag3');
的下面查詢將產生兩列,與單個TAG
在ONLY_ONE_TAG
柱(但更多的行每GROUPID
)
WITH COUNTED_TAG AS (
SELECT GROUPID, "TAG", REGEXP_COUNT("TAG",'(^|)[^ ]{1,}') AS TAG_COUNT FROM GROUPID_TAG),
KEYED_COUNTED_TAG AS (
SELECT GROUPID, "TAG", TAG_COUNT, TAG_KEG_GENERATOR.TAG_KEY FROM COUNTED_TAG
INNER JOIN (SELECT LEVEL AS TAG_KEY FROM DUAL CONNECT BY LEVEL <= 999) TAG_KEG_GENERATOR
ON TAG_KEG_GENERATOR.TAG_KEY <= COUNTED_TAG.TAG_COUNT)
SELECT DISTINCT GROUPID, REPLACE(REGEXP_SUBSTR("TAG",'(^|)[^ ]{1,}',1,TAG_KEY),' ','') AS ONLY_ONE_TAG
FROM KEYED_COUNTED_TAG
ORDER BY 1 ASC, 2 ASC;
運行它給出:
GROUPID ONLY_ONE_TAG
1 Tag1
1 Tag2
1 Tag3
1 Tag4
2 Tag3
2 Tag4
2 Tag5
2 Tag6
此時的數據可能比原始狀態更容易使用。但是,如果你想每個GROUPID重新聚合成一行,這裏就是一個例子。與我們過去的查詢開始,我們將添加一個LISTAGG
到聚合的事情:
WITH COUNTED_TAG AS (
SELECT GROUPID, "TAG", REGEXP_COUNT("TAG",'(^|)[^ ]{1,}') AS TAG_COUNT FROM GROUPID_TAG),
KEYED_COUNTED_TAG AS (
SELECT GROUPID, "TAG", TAG_COUNT, TAG_KEG_GENERATOR.TAG_KEY FROM COUNTED_TAG
INNER JOIN (SELECT LEVEL AS TAG_KEY FROM DUAL CONNECT BY LEVEL <= 999) TAG_KEG_GENERATOR
ON TAG_KEG_GENERATOR.TAG_KEY <= COUNTED_TAG.TAG_COUNT),
DISTINCT_TAG AS(SELECT DISTINCT GROUPID, REPLACE(REGEXP_SUBSTR("TAG",'(^|)[^ ]{1,}',1,TAG_KEY),' ','') AS ONLY_ONE_TAG
FROM KEYED_COUNTED_TAG)
SELECT GROUPID, LISTAGG(ONLY_ONE_TAG,' ') WITHIN GROUP (ORDER BY ONLY_ONE_TAG ASC) AS AGGREGATED_TAG
FROM DISTINCT_TAG
GROUP BY GROUPID
ORDER BY 1 ASC;
結果:
GROUPID AGGREGATED_TAG
1 Tag1 Tag2 Tag3 Tag4
2 Tag3 Tag4 Tag5 Tag6
然後,添加一些額外的標籤,來進行測試:
INSERT INTO GROUPID_TAG VALUES (1,'Wookie Hobbit @[email protected]');
INSERT INTO GROUPID_TAG VALUES (2,'HAL-9000 Thor');
而且再次查詢:
WITH COUNTED_TAG AS (
SELECT GROUPID, "TAG", REGEXP_COUNT("TAG",'(^|)[^ ]{1,}') AS TAG_COUNT FROM GROUPID_TAG),
KEYED_COUNTED_TAG AS (
SELECT GROUPID, "TAG", TAG_COUNT, TAG_KEG_GENERATOR.TAG_KEY FROM COUNTED_TAG
INNER JOIN (SELECT LEVEL AS TAG_KEY FROM DUAL CONNECT BY LEVEL <= 999) TAG_KEG_GENERATOR
ON TAG_KEG_GENERATOR.TAG_KEY <= COUNTED_TAG.TAG_COUNT),
DISTINCT_TAG AS(SELECT DISTINCT GROUPID, REPLACE(REGEXP_SUBSTR("TAG",'(^|)[^ ]{1,}',1,TAG_KEY),' ','') AS ONLY_ONE_TAG
FROM KEYED_COUNTED_TAG)
SELECT GROUPID, LISTAGG(ONLY_ONE_TAG,' ') WITHIN GROUP (ORDER BY ONLY_ONE_TAG ASC) AS AGGREGATED_TAG
FROM DISTINCT_TAG
GROUP BY GROUPID
ORDER BY 1 ASC;
結果:
GROUPID AGGREGATED_TAG
1 @[email protected] Hobbit Tag1 Tag2 Tag3 Tag4 Wookie
2 HAL-9000 Tag3 Tag4 Tag5 Tag6 Thor
糟糕的想法四處。 「我有」是什麼意思 - 輸入數據(第一張表)是磁盤上的存儲表? 「結果」是什麼意思 - 爲報告目的顯示什麼?如果是這樣,那麼也許以這種格式獲取報告是可以的,但基礎數據違反了關係表設計的最基本原則之一。事實上,這基本上是所謂的「第一範式」。所有這些情況下的最佳解決方案是規範化您的數據 - 如果您不能在數據庫中,那麼至少在您的查詢中。 – mathguy
Jack您的數據每行每個GroupId總是有兩個標籤,或者每行有多少個標籤? – alexgibbs
它會是每行的任何標籤。它也可能是空的。謝謝。 -Jack – user3595231