SQL Statement for reconciliation with different operators

這與問題：SQL Statement for Reconciliation非常相關，但更加複雜。SQL Statement for reconciliation with different operators

下面給出的模式：

create table TBL1 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp); 
create table TBL2 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp); 
create table TBL_RESULT (ID varchar2(100) primary key not null, TBL1_ID varchar2(100), TBL2_ID varchar2(100)); 

create unique index UK_TBL_RESULT_TBL1_ID on TBL_RESULT(TBL1_ID); 
create unique index UK_TBL_RESULT_TBL2_ID on TBL_RESULT(TBL2_ID); 

insert into TBL1 VALUES('1', to_date('01/26/2012 20:00:00', 'mm/dd/yyyy hh24:mi:ss')); 
insert into TBL1 VALUES('2', to_date('01/26/2012 20:05:00', 'mm/dd/yyyy hh24:mi:ss')); 

insert into TBL2 VALUES('3', to_date('01/26/2012 19:59:00', 'mm/dd/yyyy hh24:mi:ss')); 
insert into TBL2 VALUES('4', to_date('01/26/2012 20:04:00', 'mm/dd/yyyy hh24:mi:ss'));

我們目前查詢：

INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
SELECT rawtohex(sys_guid()),t1.id,t2.id 
FROM 
(SELECT t1.match_criteria,t1.id, row_number() OVER (PARTITION BY t1.match_criteria ORDER BY t1.id) rn 
FROM tbl1 t1) t1, 
(SELECT t2.match_criteria,t2.id, row_number() OVER (PARTITION BY t2.match_criteria ORDER BY t2.id) rn 
FROM tbl2 t2) t2 
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440) 
AND t1.rn=t2.rn

它的輸出：

| ID | TBL1_ID | TBL2_ID | 
| '1' | '1'  | '3' | 
| '2' | '1'  | '4' | 
| '3' | '2'  | '3' | 
| '4' | '2'  | '4' |

如您所見，結果不符合唯一性約束（重複TBL1_ID /重複TBL2_ID）。這是因爲：

爲每個記錄的RN始終爲1（因而總是等於）
兩個記錄之間的時間是10分鐘。

我們期待的輸出，看起來像下表：

| ID | TBL1_ID | TBL2_ID | 
| '1' | '1'  | '4' | 
| '2' | '2'  | '3' |

注1：如果「1」與「3」 2匹配，但隨後」沒關係'應與'4'匹配以符合約束，並且只要T1.MATCH_CRITERIA在T2.MATCH_CRITERIA的10分鐘內。注2：我們從TBL1插入了100萬條記錄，另有100萬條記錄從TBL2插入。因此，使用PL/SQL進行順序插入是不可接受的，除非它可以運行得非常快（少於15分鐘）。

注3：不匹配的數據應該被消除。不平衡的數據也是預期的。

注4：我們不限於只執行1個查詢。一系列有限的查詢將會做。

來源

2012-01-26 John

發生什麼情況，如果有在T1行不能在T2（反之亦然）行相匹配？你是否消除了這些數據？或者你是否希望最終得到'TBL1_ID'或'TBL2_ID'爲NULL的輸出？ –

順便說一下，您的測試數據在格式掩碼中使用了'MM'兩次。第二次你的意思是「MI」。這是一個常見的錯誤。 – APC

@JustinCave，消除數據。 – John

在您的查詢生成交叉連接時，因爲您的業務規則無法提供將T1中的一條記錄與T2中的一條記錄相鏈接的機制。鑑於這顯然是一個玩具例如，它是我們很難認爲不是一件很簡單的其他任何東西：

(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria,t1.id) rn 
.... 
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria,t2.id) rn

這將只需在T1 ResultSet中的第一行與第一行匹配在T2的ResultSet中， T1結果集中的第二行與T2結果集中的第二行，依此類推。

SQL> INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
SELECT seq_tbl_result.nextval,t1.id,t2.id 
FROM 
(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria, t1.id) rn 
FROM tbl1 t1) t1, 
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria, t2.id) rn 
FROM tbl2 t2) t2 
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440) 
AND t1.rn=t2.rn 
SQL> SQL> SQL> 2 3 4 5 6 7 8 9 
10/

2 rows created. 


SQL> select * from tbl_result 
    2/

ID  TBL1_I TBL2_I 
------ ------ ------ 
9  1  3 
10  2  4 

SQL>

這可能不是你想要的。在這種情況下，您需要解釋您的數據以及決定與什麼鏈接的規則。例如，是否有某種模式可以讓你得到一個錨點？另外，當我統治世界時，使用VARCHAR2（100）列保存數字ID的人將被拍攝。

來源

2012-01-26 15:15:04 APC

「旁邊」單獨將保證+1 :) –

這是唯一的數據。查詢不會在給定此數據的情況下運行：INSERT INTO TBL1 VALUES（'1'，TO_DATE（'01/26/2012 01:00:00'，'mm/dd/yyyy hh24：mi：ss'））; 插入TBL1 VALUES（'2'，to_date（'01/26/2012 02:00:00'，'mm/dd/yyyy hh24：mi：ss'））; （'3'，to_date（'01/26/2012 02:00:00'，'mm/dd/yyyy hh24：mi：ss'））;插入TBL2 VALUES INSERT INTO TBL2 VALUES（'4'，TO_DATE（'01/26/2012 03:00:00'，'mm/dd/yyyy hh24：mi：ss'））; – John

我認爲這可以工作：

INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
select seq_tbl_result.nextval, 
tt1.id, tt2.id 
from (select id, v, row_number() over(partition by v order by id) rn 
from (select distinct t1.id, 
case 
when (t1.match_criteria between 
t2.match_criteria - (10/1440) and 
t2.match_criteria + (10/1440)) then 
1 
else 
2 
end v 
from tbl1 t1, tbl2 t2 
where t1.match_criteria between 
t2.match_criteria - (10/1440) and 
t2.match_criteria + (10/1440))) tt1, 
(select id, v, row_number() over(partition by v order by id) rn 
from (select distinct t2.id, 
case 
when (t1.match_criteria between 
t2.match_criteria - (10/1440) and 
t2.match_criteria + (10/1440)) then 
1 
else 
2 
end v 
from tbl1 t1, tbl2 t2 
where t1.match_criteria between 
t2.match_criteria - (10/1440) and 
t2.match_criteria + (10/1440))) tt2 
where tt1.v = tt2.v 
and tt1.rn = tt2.rn

來源

2012-01-26 15:59:45

嗨，我認爲v總是1.無論如何，我試圖混淆數據。 Tbl 1數據（按ID排序）：2AM，然後從上午1:01開始，每隔1分鐘記錄10條記錄。 Tbl 2數據（按ID排序）：從上午1:01開始，然後是2AM，間隔爲1分鐘的10條記錄。結果：TBL1的2AM與TBL2的1:01 AM數據相匹配。但我認爲你已經接近...... – John

另一種情況：TBl1有3個數據：01:00 AM，01:05 AM，02:00 AM。 TBL2僅有2個數據：01:00 AM，02:00 AM。 TBL1的01:00 AM將與TBL2的01:00 AM匹配，但TBL1的01:05 AM將與TBL2的02:00 AM匹配，並且TBL1的02:00 AM將不匹配。 – John

你說得對，v應該以某種方式依賴於t2.matched_criteria –

SQL Statement for reconciliation with different operators

回答

相關問題