SQL Problem Solved: Merge History Tables
Yesterday I published a SQL problem (that I I had solved – or at least, thought I had solved!) and invited readers to submit their own solutions.
SPOILER ALERT: if you haven’t had a chance to have a go yourself, go back, don’t read the comments (yet), but see if you can solve it yourself.
I was very pleased to see two excellent submissions that were both correct as far as I could tell, and are (in my opinion) sufficiently simple, elegant and efficient. Note that this was not a competition and I will not be ranking or rating or anything like that. If you need to solve a similar problem, I would suggest you try all of these solutions out and pick the one that seems to work the best for you.
Solution #1
Stew Ashton supplied the first solution which was very simple and elegant:
with dates as ( select distinct deptid, dte from ( select deptid, from_date, to_date+1 to_date from t1 union all select deptid, from_date, to_date+1 to_date from t2 ) unpivot(dte for col in(from_date, to_date)) ) select * from ( select distinct a.deptid, dte from_date, lead(dte) over(partition by a.deptid order by dte) - 1 to_date, data1, data2 from dates a left join t1 b on a.deptid = b.deptid and a.dte between b.from_date and b.to_date left join t2 c on a.deptid = c.deptid and a.dte between c.from_date and c.to_date ) where data1 is not null or data2 is not null order by 1,2;
Stew’s solution does a quick pass over the two tables and generates a “date dimension” table in memory; it then does left joins to each table to generate each row. It may have a disadvantage due to the requirement to make two passes over the tables, which might be mitigated by suitable indexing. As pointed out by Matthias Rogel, this solution can be easily generalised to more than two tables, which is a plus.
Solution #2
The second solution was provided by Kim Berg Hansen which was a bit longer but in my opinion, still simple and elegant:
with v1 as ( select t.deptid , case r.l when 1 then t.from_date when 2 then date '0001-01-01' when 3 then t.max_to + 1 end from_date , case r.l when 1 then t.to_date when 2 then t.min_from -1 when 3 then date '9999-01-01' end to_date , case r.l when 1 then t.data1 end data1 from ( select deptid , from_date , to_date , data1 , row_number() over (partition by deptid order by from_date) rn , min(from_date) over (partition by deptid) min_from , max(to_date) over (partition by deptid) max_to from t1 ) t, lateral( select level l from dual connect by level <= case t.rn when 1 then 3 else 1 end ) r ), v2 as ( select t.deptid , case r.l when 1 then t.from_date when 2 then date '0001-01-01' when 3 then t.max_to + 1 end from_date , case r.l when 1 then t.to_date when 2 then t.min_from -1 when 3 then date '9999-01-01' end to_date , case r.l when 1 then t.data2 end data2 from ( select deptid , from_date , to_date , data2 , row_number() over (partition by deptid order by from_date) rn , min(from_date) over (partition by deptid) min_from , max(to_date) over (partition by deptid) max_to from t2 ) t, lateral( select level l from dual connect by level <= case t.rn when 1 then 3 else 1 end ) r ) select v1.deptid , greatest(v1.from_date, v2.from_date) from_date , least(v1.to_date, v2.to_date) to_date , v1.data1 , v2.data2 from v1 join v2 on v2.deptid = v1.deptid and v2.to_date >= v1.from_date and v2.from_date <= v1.to_date where v2.data2 is not null or v1.data1 is not null order by v1.deptid , greatest(v1.from_date, v2.from_date)
Since it takes advantage of 12c’s new LATERAL feature, I wasn’t able to test it on my system, but I was able to test it on Oracle’s new Live SQL and it passed with flying colours. I backported it to 11g and it worked just fine on my system as well. An advantage of Kim’s solution is that it should only make one pass over each of the two tables.
Oracle Live SQL: merge-history-kimberghansen (this is the 12c version – Oracle login required)
SQL Fiddle (this is my attempt to backport to 11g – you can judge for yourself if I was faithful to the original)
Solution #3
When my colleague came to me with this problem, I started by creating a test suite including as many test cases as we could think of. Developing and testing the solution against these test cases proved invaluable and allowed a number of ideas to be tested and thrown away. Big thanks go to Kim Berg Hansen for providing an additional test case, which revealed a small bug in my original solution.
with q1 as ( select nvl(t1.deptid, t2.deptid) as deptid ,greatest(nvl(t1.from_date, t2.from_date), nvl(t2.from_date, t1.from_date)) as gt_from_date ,least(nvl(t1.to_date, t2.to_date), nvl(t2.to_date, t1.to_date)) as lt_to_date ,t1.from_date AS t1_from_date ,t1.to_date AS t1_to_date ,t2.from_date AS t2_from_date ,t2.to_date AS t2_to_date ,min(greatest(t1.from_date, t2.from_date)) over (partition by t1.deptid) - 1 as fr_to_date ,max(least(t1.to_date, t2.to_date)) over (partition by t2.deptid) + 1 as lr_from_date ,t1.data1 ,t2.data2 ,row_number() over (partition by t1.deptid, t2.deptid order by t1.from_date) as rn_fr1 ,row_number() over (partition by t1.deptid, t2.deptid order by t2.from_date) as rn_fr2 ,row_number() over (partition by t1.deptid, t2.deptid order by t1.to_date desc) as rn_lr1 ,row_number() over (partition by t1.deptid, t2.deptid order by t2.to_date desc) as rn_lr2 from t1 full outer join t2 on t1.deptid = t2.deptid and (t1.from_date between t2.from_date and t2.to_date or t1.to_date between t2.from_date and t2.to_date or t2.from_date between t1.from_date and t1.to_date or t2.to_date between t1.from_date and t1.to_date) order by 1, 2 ) select deptid ,gt_from_date AS from_date ,lt_to_date AS to_date ,data1 ,data2 from q1 union all select deptid ,t1_from_date as from_date ,fr_to_date AS to_date ,data1 ,'' AS data2 from q1 where fr_to_date between t1_from_date and t1_to_date and rn_fr1 = 1 union all select deptid ,t2_from_date as from_date ,fr_to_date AS to_date ,'' AS data1 ,data2 from q1 where fr_to_date between t2_from_date and t2_to_date and rn_fr2 = 1 union all select deptid ,lr_from_date as from_date ,t2_to_date AS to_date ,'' AS data1 ,data2 from q1 where lr_from_date between t2_from_date and t2_to_date and rn_lr2 = 1 union all select deptid ,lr_from_date as from_date ,t1_to_date AS to_date ,data1 ,'' AS data2 from q1 where lr_from_date between t1_from_date and t1_to_date and rn_lr1 = 1 order by deptid, from_date
This solution is probably the ugliest of the three, and though it only requires one pass over the two tables, it materializes all the data in a temp table so probably requires a fair amount of memory for large data sets.
I’m so glad I posted this problem – I received help in the form of two quite different solutions, which help to show different ways of approaching problems like this. Not only that, but it highlighted the value of test-driven development – instead of groping in the dark to come up with a quick-and-dirty solution (that might or might not work), starting with comprehensive test cases is invaluable for both developing and validating potential solutions.
Comments