SQL Problem: Merge record history tables
A colleague asked me if I could help solve a problem in SQL, and I was grateful because I love solving problems in SQL! So we sat down, built a test suite, and worked together to solve it.
I will first present the problem and invite you to have a go at solving it for yourself; tomorrow I will post the solution we came up with. I think our solution is reasonably simple and elegant, but I’m certain there are other solutions as well, some of which may be more elegant or efficient.
I have two “record history” tables, from different data sources, that record the history of changes to two sets of columns describing the same entity. Here are the example tables:
T1 (deptid, from_date, to_date, data1) T2 (deptid, from_date, to_date, data2)So T1 records the changes in the column “data1” over time for each deptid; and T2 records the changes in the column “data2” over time for each deptid.
Assumptions/Constraints on T1 and T2:
- All columns are not Null.
- All dates are just dates (no time values).
- All date ranges are valid (i.e. from_date <- to_date).
- Within each table, all date ranges for a particular deptid are gap-free and have no overlaps.
We want to produce a query that returns the following structure:
T3 (deptid, from_date, to_date, data1, data2)Where the dates for a particular deptid overlap between T1 and T2, we want to see the corresponding data1 and data2 values. Where the dates do not overlap (e.g. where we have a value for data1 in T1 but not in T2), we want to see the corresponding data1 but NULL for data2; and vice versa.
Below is the setup and test cases you can use if you wish. Alternatively, you can use the SQL Fiddle here.
create table t1 (deptid number not null ,from_date date not null ,to_date date not null ,data1 varchar2(30) not null ,primary key (deptid, from_date) ); create table t2 (deptid number not null ,from_date date not null ,to_date date not null ,data2 varchar2(30) not null ,primary key (deptid, from_date) ); --Test Case 1 (4001) --T1 |-------------||-------------| --T2 |-------------||-------------||-------------| insert into t1 values (4001, date'2015-01-01', date'2015-01-31', 'jan data 1'); insert into t1 values (4001, date'2015-02-01', date'2015-02-28', 'feb data 1'); insert into t2 values (4001, date'2015-01-15', date'2015-02-14', 'jan-feb data 2'); insert into t2 values (4001, date'2015-02-15', date'2015-03-14', 'feb-mar data 2'); insert into t2 values (4001, date'2015-03-15', date'2015-04-14', 'mar-apr data 2'); --Expected Output: --DEPTID FROM_DATE TO_DATE DATA1 DATA2 --====== =========== =========== ============== ============== --4001 01-JAN-2015 14-JAN-2015 jan data 1 --4001 15-JAN-2015 31-JAN-2015 jan data 1 jan-feb data 2 --4001 01-FEB-2015 14-FEB-2015 feb data 1 jan-feb data 2 --4001 15-FEB-2015 28-FEB-2015 feb data 1 feb-mar data 2 --4001 01-MAR-2015 14-MAR-2015 feb-mar data 2 --4001 15-MAR-2015 14-APR-2015 mar-apr data 2 --Test Case 2 (4002) --T1 |-------------||-------------||-------------| --T2 |-------------||-------------| insert into t1 values (4002, date'2015-01-15', date'2015-02-14', 'jan-feb data 1'); insert into t1 values (4002, date'2015-02-15', date'2015-03-14', 'feb-mar data 1'); insert into t1 values (4002, date'2015-03-15', date'2015-04-14', 'mar-apr data 1'); insert into t2 values (4002, date'2015-01-01', date'2015-01-31', 'jan data 2'); insert into t2 values (4002, date'2015-02-01', date'2015-02-28', 'feb data 2'); --Expected Output: --DEPTID FROM_DATE TO_DATE DATA1 DATA2 --====== =========== =========== ============== ============== --4002 01-JAN-2015 14-JAN-2015 jan data 2 --4002 15-JAN-2015 31-JAN-2015 jan-feb data 1 jan data 2 --4002 01-FEB-2015 14-FEB-2015 jan-feb data 1 feb data 2 --4002 15-FEB-2015 28-FEB-2015 feb-mar data 1 feb data 2 --4002 01-MAR-2015 14-MAR-2015 feb-mar data 1 --4002 15-MAR-2015 14-APR-2015 mar-apr data 1 --Test Case 3 (4003) --T1 |-------------| --T2 |-------------| insert into t1 values (4003, date'2015-01-01', date'2015-01-31', 'jan data 1'); insert into t2 values (4003, date'2015-01-01', date'2015-01-31', 'jan data 2'); --Expected Output: --DEPTID FROM_DATE TO_DATE DATA1 DATA2 --====== =========== =========== ============== ============== --4003 01-JAN-2015 31-JAN-2015 jan data 1 jan data 2 --Test Case 4 (4004) --T1 |-------------| --T2 |-----| insert into t1 values (4004, date'2015-01-01', date'2015-01-31', 'jan data 1'); insert into t2 values (4004, date'2015-01-10', date'2015-01-20', 'jan data int 2'); --Expected Output: --DEPTID FROM_DATE TO_DATE DATA1 DATA2 --====== =========== =========== ============== ============== --4004 01-JAN-2015 09-JAN-2015 jan data 1 --4004 10-JAN-2015 20-JAN-2015 jan data 1 jan data int 2 --4004 21-JAN-2015 31-JAN-2015 jan data 1 --Test Case 5 (4005) --T1 |-----| --T2 |-------------| insert into t1 values (4005, date'2015-01-10', date'2015-01-20', 'jan data int 1'); insert into t2 values (4005, date'2015-01-01', date'2015-01-31', 'jan data 2'); --Expected Output: --DEPTID FROM_DATE TO_DATE DATA1 DATA2 --====== =========== =========== ============== ============== --4005 01-JAN-2015 09-JAN-2015 jan data 2 --4005 10-JAN-2015 20-JAN-2015 jan data int 1 jan data 2 --4005 21-JAN-2015 31-JAN-2015 jan data 2 --Test Case 6 (4006) --T1 ||------------| --T2 |-------------| insert into t1 values (4006, date'2015-01-01', date'2015-01-01', 'jan data a 1'); insert into t1 values (4006, date'2015-01-02', date'2015-01-31', 'jan data b 1'); insert into t2 values (4006, date'2015-01-01', date'2015-01-31', 'jan data 2'); --Expected Output: --DEPTID FROM_DATE TO_DATE DATA1 DATA2 --====== =========== =========== ============== ============== --4006 01-JAN-2015 01-JAN-2015 jan data a 1 jan data 2 --4006 02-JAN-2015 31-JAN-2015 jan data b 1 jan data 2 --Test Case 7 (4007) --T1 |-------------| --T2 ||------------| insert into t1 values (4007, date'2015-01-01', date'2015-01-31', 'jan data 1'); insert into t2 values (4007, date'2015-01-01', date'2015-01-01', 'jan data a 2'); insert into t2 values (4007, date'2015-01-02', date'2015-01-31', 'jan data b 2'); --Expected Output: --DEPTID FROM_DATE TO_DATE DATA1 DATA2 --====== =========== =========== ============== ============== --4007 01-JAN-2015 01-JAN-2015 jan data 1 jan data a 2 --4007 02-JAN-2015 31-JAN-2015 jan data 1 jan data b 2 --Test Case 8 (4008) --T1 |-------------| --T2 |---||---||---| insert into t1 values (4008, date'2015-01-01', date'2015-01-31', 'jan data 1'); insert into t2 values (4008, date'2015-01-09', date'2015-01-17', 'jan data a 2'); insert into t2 values (4008, date'2015-01-18', date'2015-01-26', 'jan data b 2'); insert into t2 values (4008, date'2015-01-27', date'2015-02-04', 'jan-feb data 2'); --Expected Output: --DEPTID FROM_DATE TO_DATE DATA1 DATA2 --====== =========== =========== ============== ============== --4008 01-JAN-2015 08-JAN-2015 jan data 1 --4008 09-JAN-2015 17-JAN-2015 jan data 1 jan data a 2 --4008 18-JAN-2015 26-JAN-2015 jan data 1 jan data b 2 --4008 27-JAN-2015 31-JAN-2015 jan data 1 jan-feb data 2 --4008 01-FEB-2015 04-FEB-2015 jan-feb data 2 --Test Case 9 (4009) --T1 |---||---||---| --T2 |-------------| insert into t1 values (4009, date'2015-01-09', date'2015-01-17', 'jan data a 1'); insert into t1 values (4009, date'2015-01-18', date'2015-01-26', 'jan data b 1'); insert into t1 values (4009, date'2015-01-27', date'2015-02-04', 'jan-feb data 1'); insert into t2 values (4009, date'2015-01-01', date'2015-01-31', 'jan data 2'); --Expected Output: --DEPTID FROM_DATE TO_DATE DATA1 DATA2 --====== =========== =========== ============== ============== --4009 01-JAN-2015 08-JAN-2015 jan data 2 --4009 09-JAN-2015 17-JAN-2015 jan data a 1 jan data 2 --4009 18-JAN-2015 26-JAN-2015 jan data b 1 jan data 2 --4009 27-JAN-2015 31-JAN-2015 jan-feb data 1 jan data 2 --4009 01-FEB-2015 04-FEB-2015 jan-feb data 1
EDIT 15/10/2015: added Kim Berg Hansen’s test cases
UPDATE 15/10/2015: solutions (spoiler alert!)
stewashton
14 October 2015 - 3:49 pm
Note – SQL:2011 says TO_DATE should be exclusive, not inclusive. Oracle 12c agrees.
An addition to the requirement would be to merge contiguous date ranges with the same date. The “start of group” method or 12c MATCH_RECOGNIZE could help there.
Best regards,
Stew Ashton
stewashton.wordpress.com
Jeffrey Kemp
15 October 2015 - 9:32 am
Flawless: http://sqlfiddle.com/#!4/58678/1
T. Bhgoeswara Prasad
14 February 2016 - 10:21 pm
stewashton,
I am new to oracle development . I need your help get the below result . Please help me.
We have two tables A, B . In A we have two columns one is FROM_DATE and TO_ DATE.
that is
FROM_DATE TO_ DATE
10-NOV-2015 31-DEC-9999
In B table
id Valid_from Valid_to
1 10-NOV-2015 31-DEC-2015
2 11-NOV-2016 31-DEC-2016
The dates should be in B table as between A table dates . That exists already. Here the requirement is we have missed some dates between the A table from date and to date. In B table we have from 10-NOV_2015 to 31-DEC-2015 after that we have missed 1-1-2016 to 10-nov-2016. Then we have dates in b table from 11-NOV-2016 to 31-DEC-2016 . Again we have missed the dates after 31-DEC-2016 . that is from 1-1-2017 to 31-DEC-9999. The final expected out should be like below. Please help me how to fetch this.
id Valid_from Valid_to
1 10-NOV-2015 31-DEC-2015
2 01-01-2016 10-NOV-2016 ——> Please how to fetch this date.
3 11-NOV-2016 31-DEC-2016
4 01-01-2017 31-DEC-9999 ——–> please how to fetch this date.
Jeffrey Kemp
15 February 2016 - 8:47 am
Hi T. Bhgoeswara Prasad,
Well, I’m not stewashton 🙂 but this might help – I had to fiddle with the dates somewhat because of the “31/12/9999” date:
http://livesql.oracle.com/apex/livesql/s/cvkntrdbeh68d5kl0jphm2trq
Kim Berg Hansen
14 October 2015 - 8:12 pm
It’s not elegant, on the contrary rather ugly, but it’ll work with a single scan of both tables 😉
I added an extra test case because some of the things I tried I was uncertain whether it would handle multiple overlaps:
And I get this output:
As far as I can tell it gives the expected output in all cases.
Jeffrey Kemp
15 October 2015 - 8:24 am
Thanks for the additional test case, it revealed an issue with my version!
Jeffrey Kemp
15 October 2015 - 9:34 am
I’ve modified it a bit to work in 11g, and it works; see what you think: http://sqlfiddle.com/#!4/58678/2
Kim Berg Hansen
15 October 2015 - 3:22 pm
Yup, looks like good backport 😉
Only 12c specific issue in my version is, that I use LATERAL to correlate the CONNECT BY row generator and make it only generate as many rows as needed (1 or 3). Your method of filtering the rowgen in join works fine too, and it’ll work from version 10.