2013-02-05 56 views
0

我想要一個筆記字段,只是一個大的文本塊,示例數據如下,如果我插入到表中。SQL正則表達式解析文本添加在新行

create table test_table 
(
job_number number, 
notes varchar2(4000) 
) 

insert into test_table (job_number,notes) 
values (12345,1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes) 

我需要解析出來,所以有每個音符項(領先票據10個數字是UNIX時間戳)一個單獨的記錄。所以,如果我是出口到管道分隔它應該是這樣的:

job_number |注意

12345 | 1022089483筆錄筆錄

12345 | 1022094450筆錄筆錄

12345 | 1022095218筆記筆記筆記

我真的很希望這是有道理的。我感謝任何見解。

+0

我假設每行的註釋數量是不一樣的?你還有什麼版本的oracle? – DazzaL

+0

是的,筆記數量有所不同。我認爲我們在8或9。正則表達式不是內置的,但我們已經創建了一些函數來執行一些正則表達式的東西。 – user1588433

回答

0

這樣做的幾種方法:

SQL> insert into test_table (job_number,notes) 
    2 values (12345,'1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes'); 

1 row created. 

SQL> insert into test_table (job_number,notes) 
    2 values (12346,'1022089483 notes notes notes notes 1022094450 foo 1022095218 test notes 1022493228 the answer is 42'); 

1 row created. 

SQL> commit; 

Commit complete. 

注:我使用[0-9]{10}我的正則表達式來確定的說明(即任何10位數字被認爲是音符的開始)。首先,我們可以採取計算任意給定行中最大筆記數的方法,然後用該行數進行笛卡爾連接。然後篩選出每一個音符:

SQL> with data 
    2 as (select job_number, notes, 
    3   (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes 
    4  from test_table t) 
    5 select job_number, 
    6   substr(d.notes, regexp_instr(d.notes, '[0-9]{10}', 1, rn.l), 
    7      regexp_instr(d.notes||' 0000000000', '[0-9]{10}', 1, rn.l+1) 
    8      -regexp_instr(d.notes, '[0-9]{10}', 1, rn.l) -1 
    9    ) note 
10 from data d 
11   cross join (select rownum l 
12      from dual 
13     connect by level <= (select max(num_of_notes) 
14           from data)) rn 
15 where rn.l <= d.num_of_notes 
16 order by job_number, rn.l; 

JOB_NUMBER NOTE 
---------- -------------------------------------------------- 
    12345 1022089483 notes notes notes notes 
    12345 1022094450 notes notes notes notes 
    12345 1022095218 notes notes notes notes 
    12346 1022089483 notes notes notes notes 
    12346 1022094450 foo 
    12346 1022095218 test notes 
    12346 1022493228 the answer is 42 

7 rows selected. 

這是確定只要音符的數量大致相同(差異越大 越差這個尺度,因爲我們做了很多的遞歸查詢的)。

在11g中,我們可以使用一個resursive保理子查詢做上述同樣的事情,但沒有做額外的循環:

SQL> with data (job_number, notes, note, num_of_notes, iter) 
    2 as (select job_number, notes, 
    3    substr(notes, regexp_instr(notes, '[0-9]{10}', 1, 1), 
    4     regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, 2) 
    5     -regexp_instr(notes, '[0-9]{10}', 1, 1) -1 
    6     ), 
    7    (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes, 
    8    1 
    9  from test_table 
10  union all 
11  select job_number, notes, 
12    substr(notes, regexp_instr(notes, '[0-9]{10}', 1, iter+1), 
13     regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, iter+2) 
14     -regexp_instr(notes, '[0-9]{10}', 1, iter+1) -1 
15     ), 
16    num_of_notes, iter + 1 
17  from data 
18  where substr(notes, regexp_instr(notes, '[0-9]{10}', 1, iter+1), 
19     regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, iter+2) 
20     -regexp_instr(notes, '[0-9]{10}', 1, iter+1) -1 
21     ) is not null 
22 ) 
23 select job_number, note 
24 from data 
25 order by job_number, iter; 

JOB_NUMBER NOTE 
---------- -------------------------------------------------- 
    12345 1022089483 notes notes notes notes 
    12345 1022094450 notes notes notes notes 
    12345 1022095218 notes notes notes notes 
    12346 1022089483 notes notes notes notes 
    12346 1022094450 foo 
    12346 1022095218 test notes 
    12346 1022493228 the answer is 42 

7 rows selected. 

或從10G開始,我們可以使用示範條款來彌補行:

SQL> with data as (select job_number, notes, 
    2      (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes 
    3     from test_table) 
    4 select job_number, note 
    5 from data 
    6 model 
    7 partition by (job_number) 
    8 dimension by (1 as i) 
    9 measures (notes, num_of_notes, cast(null as varchar2(4000)) note) 
10 rules 
11 (
12 note[for i from 1 to num_of_notes[1] increment 1] 
13  = substr(notes[1], 
14    regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)), 
15    regexp_instr(notes[1]||' 0000000000', '[0-9]{10}', 1, cv(i)+1) 
16    -regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)) -1 
17    ) 
18 ) 
19 order by job_number, i; 

JOB_NUMBER NOTE 
---------- -------------------------------------------------- 
    12345 1022089483 notes notes notes notes 
    12345 1022094450 notes notes notes notes 
    12345 1022095218 notes notes notes notes 
    12346 1022089483 notes notes notes notes 
    12346 1022094450 foo 
    12346 1022095218 test notes 
    12346 1022493228 the answer is 42 
+0

非常感謝您的幫助。 – user1588433