2017-07-08 20 views
-3

表'mytable'中有一列名爲'Description'。匹配同一列中所有行的字

+----+-------------------------------+ 
| ID | Description     | 
+----+-------------------------------+ 
| 1 | My NAME is Sajid KHAN   | 
| 2 | My Name is Ahmed Khan   | 
| 3 | MY friend name is Salman Khan | 
+----+-------------------------------+ 

我需要寫一個Oracle SQL查詢/程序/功能列出列的不同的話。

輸出應爲:

+------------------+-------+ 
| Word    | Count | 
+------------------+-------+ 
| MY    |  3 | 
| NAME    |  3 | 
| IS    |  3 | 
| SAJID   |  1 | 
| KHAN    |  3 | 
| AHMED   |  1 | 
| FRIEND   |  1 | 
| SALMAN   |  1 | 
+------------------+-------+ 

字匹配應該是不區分大小寫的。

我正在使用Oracle 12.1。

+0

到目前爲止您嘗試了什麼? –

回答

1

讓我們假設我們會以某種方式設法將所有的描述分開。 因此,而不是單行ID = 1和說明=「我的名字是薩吉德·坎」,我們不得不這樣

ID | Description 
--- | ------------ 
1 | My 
1 | NAME 
1 | is 
1 | Sajid 
1 | KHAN 
以這種形式

5行這將會是微不足道的,像

select Description, count(*) from data_in_new_form group by Description 

所以,我們使用遞歸查詢來做到這一點。

create table mytable 
as 
select 1 as ID, 'My NAME is Sajid KHAN' as Description from dual 
union all 
select 2, 'My Name is Ahmed Khan' from dual 
union all 
select 3, 'MY friend name is Salman Khan' from dual 
union all 
select 4, 'test, punctuation! it is' from dual 
; 


with 
rec (id, str, depth, element_value) as 
(
    -- Anchor member. 
    select id, upper(Description) as str, 1 as depth, REGEXP_SUBSTR(upper(Description), '(.*?)(|$)', 1, 1, NULL, 1) AS element_value 
    from mytable 
    UNION ALL 
    -- Recursive member. 
    select id, str, depth + 1, REGEXP_SUBSTR(str ,'(.*?)(|$)', 1, depth+1, NULL, 1) AS element_value 
    from rec 
    where depth < regexp_count(str, ' ')+1 
) 
, data as (
select * from rec 
--order by id, depth 
) 
select element_value, count(*) from data 
group by element_value 
order by element_value 
; 

請注意,該版本不會對標點符號做任何事情,假設詞語用空格分隔。採用分層查詢

with rec as 
(
    SELECT id, LEVEL AS depth, 
    REGEXP_SUBSTR(upper(description) ,'(.*?)(|$)', 1, LEVEL, NULL, 1) AS element_value 
    FROM mytable 
    CONNECT BY LEVEL <= regexp_count(description, ' ')+1 
    and prior id = id 
    and prior SYS_GUID() is not null 
) 
, data as (
select * from rec 
--order by id, depth 
) 
select element_value, count(*) from data 
group by element_value 
order by 2 desc 
; 
+0

非常感謝您的快速響應。我嘗試這個查詢,它的工作正常。 。 –

+0

我有Oracle 10g,11g和12c。 此查詢只適用於12c不在10g和11g是他們的任何等效查詢10g,11g ?????????? –

+0

這很奇怪:只測試了一個11g DB上的遞歸查詢,它工作正常。嘗試使用分層版本。 –

0

這個查詢將工作

UPDATE另一種方式。單詞的排序可能不同。不過,頻繁出現的詞語就像您列出的那樣開始。

SELECT word, 
     COUNT(*) 
     FROM 
     (SELECT TRIM (REGEXP_SUBSTR (Description, '[^ ]+', 1, ROWNUM)) AS Word 
     FROM 
     (SELECT LISTAGG(UPPER(Description),' ') within GROUP(
      ORDER BY ROWNUM) AS Description 
     FROM mytable 
     ) 
     CONNECT BY LEVEL <= REGEXP_COUNT (Description, '[^ ]+') 
    ) 
    GROUP BY WORD 
    ORDER BY 2 DESC; 
+0

該'LISTAGG'可以引發'ORA- 01489:字符串連接的結果太長'異常。 –

+0

謝謝你,這也適用於我,因爲我的列是varchar(100),所以我的字符串不會太長。 –

+0

但是有多少這樣的列要連接? 40行這樣的行可能已經太長了。 –

相關問題