Tag: SQL

Compiling views: when the FORCE fails you

Darth-Vader-selfieThe order in which your deployment scripts create views is important. This is a fact that I was reminded of when I had to fix a minor issue in the deployment of version #2 of my application recently.

Normally, you can just generate a create or replace force view script for all your views and just run it in each environment, then recompile your schema after they’re finished – and everything’s fine. However, if views depend on other views, you can run into a logical problem if you don’t create them in the order of dependency.

Software Release 1.0

create table t (id number, name varchar2(100));
create or replace force view tv_base as
select t.*, 'hello' as stat from t;
create or replace force view tv_alpha as
select t.* from tv_base t;

desc tv_alpha;
Name Null Type
---- ---- -------------
ID        NUMBER
NAME      VARCHAR2(100)
STAT      CHAR(5)

Here we have our first version of the schema, with a table and two views based on it. Let’s say that the tv_base includes some derived expressions, and tv_alpha is intended to do some joins on other tables for more detailed reporting.

Software Release 1.1

alter table t add (phone varchar2(10));
create or replace force view tv_alpha as
select t.* from tv_base t;
create or replace force view tv_base as
select t.*, 'hello' as stat from t;

Now, in the second release of the software, we added a new column to the table, and duly recompiled the views. In the development environment the view recompilation may happen multiple times (because other changes are being made to the views as well) – and nothing’s wrong. Everything works as expected.

However, when we run the deployment scripts in the Test environment, the “run all views” script has been run just once; and due to the way it was generated, the views are created in alphabetical order – so tv_alpha was recreated first, followed by tv_base. Now, when we describe the view, we see that it’s missing the new column:

desc tv_alpha;
Name Null Type
---- ---- -------------
ID        NUMBER
NAME      VARCHAR2(100)
STAT      CHAR(5)

Whoops. What’s happened, of course, is that when tv_alpha was recompiled, tv_base still hadn’t been recompiled and so it didn’t have the new column in it yet. Oracle internally defines views with SELECT * expanded to list all the columns. The view won’t gain the new column until we REPLACE the view with a new one using SELECT *. By that time, it’s too late for tv_alpha – it had already been compiled, successfully, so it doesn’t see the new column.

Lesson Learnt

What should we learn from this? Be wary of SELECT * in your views. Don’t get me wrong: they are very handy, especially during initial development of your application; but they can surprise you if not handled carefully and I would suggest it’s good practice to expand those SELECT *‘s into a discrete list of columns.

Some people would go so far as to completely outlaw SELECT *, and even views-on-views, for reasons such as the above. I’m not so dogmatic, because in my view there are some good reasons to use them in some situations.

A nice Descending Index Range Scan

I’ve been aware of some of the ways that Oracle database optimises index accesses for queries, but I’m also aware that you have to test each critical query to ensure that the expected optimisations are taking effect.

I had this simple query, the requirement of which is to get the “previous status” for a record from a journal table. Since the journal table records all inserts, updates and deletes, and this query is called immediately after an update, to get the previous status we need to query the journal for the record most recently prior to the most recent record. Since the “version_id” column is incremented for each update, we can use that as the sort order.


select status_code
from (select rownum rn, status_code
      from   xxtim_requests$jn jn
      where  jn.trq_id = :trq_id
      order by version_id desc)
where rn = 2;

The xxtim_requests$jn table has an ordinary index on (trq_id, version_id). This query is embedded in some PL/SQL with an INTO clause – so it will only fetch one record (plus a 2nd fetch to detect TOO_MANY_ROWS which we know won’t happen).

The table is relatively small (in dev it only has 6K records, and production data volumes are expected to grow very slowly) but regardless, I was pleased to find that (at least, in Oracle 12.1) it uses a nice optimisation so that it not only uses the index, it is choosing to use a Descending scan on it – which means it avoids a SORT operation, and should very quickly return the 2nd record that we desire.

index_scan_range_descending.PNG

It looks quite similar in effect to the “COUNT STOPKEY” optimisation you can see on “ROWNUM=1” queries. If this was a much larger table and this query needed to be faster or was being run more frequently, I’d probably consider appending status_code to the index in order to avoid the table access. In this case, however, I don’t think it’s necessary.

Restriction when column default is sequence.nextval

Oracle 12c introduced the ability to specify sequence.nextval as the default on a column, which is really nice – including the fact that it eliminates one of your excuses why you don’t decommission those old triggers.

Unfortunately it doesn’t work as you might expect if you use an INSERT ALL statement; it evaluates the default expression once per statement, instead of once per row.

Test case:


create sequence test_seq;

create table test_tab
( id number default test_seq.nextval primary key
, dummy varchar2(100) not null );

insert into test_tab (dummy) values ('xyz');

1 row inserted.

insert all
into test_tab (dummy) values ('abc')
into test_tab (dummy) values ('def')
select null from dual;

Error report -
SQL Error: ORA-00001: unique constraint
(SCOTT.SYS_C00123456) violated

A minor issue, usually, but something to be aware of – especially if you’re not in the habit of declaring your unique constraints to the database!

create sequence test_seq;

create table test_stupid_tab
( id number default test_seq.nextval
, dummy varchar2(100) not null );

insert into test_tab (dummy) values ('xyz');

1 row inserted.

insert all
into test_tab (dummy) values ('abc')
into test_tab (dummy) values ('def')
select null from dual;

2 rows inserted.

select * from test_tab;

i dummy
= =====
1 xyz
2 abc
2 def

ADDENDUM 28/10/2016

Another similar scenario which might trip you up is where you are inserting from a UNION view:

create sequence test_seq;

create table test_tab
( id number default test_seq.nextval primary key
, dummy varchar2(100) not null
);

insert into test_tab (dummy) select 'x' from dual;
-- success

insert into test_tab (dummy)
select 'y' from dual union all select 'z' from dual;
-- fails with ORA-01400 "cannot insert NULL into id"

insert into test_tab (dummy) select c from (
select 'y' c from dual union all select 'z' from dual
);
-- success

A random string of digits

715175039885776956103287888080

I needed to generate a random string with an exact length consisting of numeric digits, that I could send in an SMS to a user as a temporary account “pin”. DBMS_RANDOM.string is unsuitable for this purpose as its supported modes all include alphabetic characters. So I used DBMS_RANDOM.value instead. I call TRUNC afterwards to lop off the decimal portion.

FUNCTION random_pin (digits IN NUMBER)
  RETURN NUMBER IS
BEGIN
  IF digits IS NULL OR digits < 1 OR digits > 39 THEN
    RAISE_APPLICATION_ERROR(-20000,'digits must be 1..39');
  END IF;

  IF digits = 1 THEN
    RETURN TRUNC( DBMS_RANDOM.value(0,10) );
  ELSE
    RETURN TRUNC( DBMS_RANDOM.value(
                    POWER(10, digits-1)
                   ,POWER(10, digits) ));
  END IF;
END random_pin;

random_pin(digits => 6);

482372

EDIT 8/1/2016: added special case for 1 digit

ADDENDUM

Because the requirements of my “pin” function was to return a value that would remain unchanged when represented as an integer, it cannot return a string of digits starting with any zeros, which is why the lowerbound for the random function is POWER(10,digits-1). This, unfortunately, makes it somewhat less than perfectly random because zeroes are less frequent – if you call this function 1000 times for a given length of digits, then counted the frequency of each digit from 0..9, you will notice that 0 has a small but significantly lower frequency than the digits 1 to 9.

To fix this, the following function returns a random string of digits, with equal chance of returning a string starting with one or more zeroes:

FUNCTION random_digits (digits IN NUMBER)
  RETURN VARCHAR2 IS
BEGIN
  IF digits IS NULL OR digits < 1 OR digits > 39 THEN
    RAISE_APPLICATION_ERROR(-20000,'digits must be 1..39');
  END IF;
 
  RETURN LPAD( TRUNC(
    DBMS_RANDOM.value(0, POWER(10, digits))
    ), digits, '0');
END random_digits;

The above functions may be tested and downloaded from Oracle Live SQL.

SQL Problem Solved: Merge History Tables

Yesterday I published a SQL problem (that I I had solved – or at least, thought I had solved!) and invited readers to submit their own solutions.

SPOILER ALERT: if you haven’t had a chance to have a go yourself, go back, don’t read the comments (yet), but see if you can solve it yourself.

I was very pleased to see two excellent submissions that were both correct as far as I could tell, and are (in my opinion) sufficiently simple, elegant and efficient. Note that this was not a competition and I will not be ranking or rating or anything like that. If you need to solve a similar problem, I would suggest you try all of these solutions out and pick the one that seems to work the best for you.

Solution #1

Stew Ashton supplied the first solution which was very simple and elegant:

with dates as (
  select distinct deptid, dte from (
    select deptid, from_date, to_date+1 to_date from t1
    union all
    select deptid, from_date, to_date+1 to_date from t2
  )
  unpivot(dte for col in(from_date, to_date))
)
select * from (
  select distinct a.deptid, dte from_date,
  lead(dte) over(partition by a.deptid order by dte) - 1 to_date,
  data1, data2 from dates a
  left join t1 b on a.deptid = b.deptid and a.dte between b.from_date and b.to_date
  left join t2 c on a.deptid = c.deptid and a.dte between c.from_date and c.to_date
)
where data1 is not null or data2 is not null
order by 1,2;

SQL Fiddle

Stew’s solution does a quick pass over the two tables and generates a “date dimension” table in memory; it then does left joins to each table to generate each row. It may have a disadvantage due to the requirement to make two passes over the tables, which might be mitigated by suitable indexing. As pointed out by Matthias Rogel, this solution can be easily generalised to more than two tables, which is a plus.

Solution #2

The second solution was provided by Kim Berg Hansen which was a bit longer but in my opinion, still simple and elegant:

with v1 as (
   select t.deptid
        , case r.l
             when 1 then t.from_date
             when 2 then date '0001-01-01'
             when 3 then t.max_to + 1
          end from_date
        , case r.l
             when 1 then t.to_date
             when 2 then t.min_from -1
             when 3 then date '9999-01-01'
          end to_date
        , case r.l
             when 1 then t.data1
          end data1
     from (
      select deptid
           , from_date
           , to_date
           , data1
           , row_number() over (partition by deptid order by from_date) rn
           , min(from_date) over (partition by deptid) min_from
           , max(to_date) over (partition by deptid) max_to
        from t1
     ) t, lateral( 
         select level l
           from dual
         connect by level <= case t.rn when 1 then 3 else 1 end
     ) r
), v2 as (
   select t.deptid
        , case r.l
             when 1 then t.from_date
             when 2 then date '0001-01-01'
             when 3 then t.max_to + 1
          end from_date
        , case r.l
             when 1 then t.to_date
             when 2 then t.min_from -1
             when 3 then date '9999-01-01'
          end to_date
        , case r.l
             when 1 then t.data2
          end data2
     from (
      select deptid
           , from_date
           , to_date
           , data2
           , row_number() over (partition by deptid order by from_date) rn
           , min(from_date) over (partition by deptid) min_from
           , max(to_date) over (partition by deptid) max_to
        from t2
     ) t, lateral( 
         select level l
           from dual
         connect by level <= case t.rn when 1 then 3 else 1 end
     ) r
)
select v1.deptid
     , greatest(v1.from_date, v2.from_date) from_date
     , least(v1.to_date, v2.to_date) to_date
     , v1.data1
     , v2.data2
  from v1
  join v2
      on v2.deptid = v1.deptid
     and v2.to_date >= v1.from_date
     and v2.from_date <= v1.to_date
 where v2.data2 is not null or v1.data1 is not null
 order by v1.deptid
        , greatest(v1.from_date, v2.from_date)

Since it takes advantage of 12c’s new LATERAL feature, I wasn’t able to test it on my system, but I was able to test it on Oracle’s new Live SQL and it passed with flying colours. I backported it to 11g and it worked just fine on my system as well. An advantage of Kim’s solution is that it should only make one pass over each of the two tables.

Oracle Live SQL: merge-history-kimberghansen (this is the 12c version – Oracle login required)

SQL Fiddle (this is my attempt to backport to 11g – you can judge for yourself if I was faithful to the original)

Solution #3

When my colleague came to me with this problem, I started by creating a test suite including as many test cases as we could think of. Developing and testing the solution against these test cases proved invaluable and allowed a number of ideas to be tested and thrown away. Big thanks go to Kim Berg Hansen for providing an additional test case, which revealed a small bug in my original solution.

with q1 as (
select nvl(t1.deptid, t2.deptid) as deptid
      ,greatest(nvl(t1.from_date, t2.from_date), nvl(t2.from_date, t1.from_date)) as gt_from_date
      ,least(nvl(t1.to_date, t2.to_date), nvl(t2.to_date, t1.to_date))            as lt_to_date
      ,t1.from_date AS t1_from_date
      ,t1.to_date   AS t1_to_date
      ,t2.from_date AS t2_from_date
      ,t2.to_date   AS t2_to_date
      ,min(greatest(t1.from_date, t2.from_date)) over (partition by t1.deptid) - 1 as fr_to_date
      ,max(least(t1.to_date, t2.to_date)) over (partition by t2.deptid)        + 1 as lr_from_date
      ,t1.data1
      ,t2.data2
      ,row_number() over (partition by t1.deptid, t2.deptid order by t1.from_date) as rn_fr1
      ,row_number() over (partition by t1.deptid, t2.deptid order by t2.from_date) as rn_fr2
      ,row_number() over (partition by t1.deptid, t2.deptid order by t1.to_date desc) as rn_lr1
      ,row_number() over (partition by t1.deptid, t2.deptid order by t2.to_date desc) as rn_lr2
from   t1
full outer join t2
on     t1.deptid = t2.deptid
and    (t1.from_date between t2.from_date and t2.to_date
        or t1.to_date between t2.from_date and t2.to_date
        or t2.from_date between t1.from_date and t1.to_date
        or t2.to_date between t1.from_date and t1.to_date)
order by 1, 2
)
select deptid
      ,gt_from_date AS from_date
      ,lt_to_date   AS to_date
      ,data1
      ,data2
from   q1
union all
select deptid
      ,t1_from_date as from_date
      ,fr_to_date AS to_date
      ,data1
      ,'' AS data2
from   q1
where  fr_to_date between t1_from_date and t1_to_date
and    rn_fr1 = 1
union all
select deptid
      ,t2_from_date as from_date
      ,fr_to_date AS to_date
      ,'' AS data1
      ,data2
from   q1
where  fr_to_date between t2_from_date and t2_to_date
and    rn_fr2 = 1
union all
select deptid
      ,lr_from_date as from_date
      ,t2_to_date AS to_date
      ,'' AS data1
      ,data2
from   q1
where  lr_from_date between t2_from_date and t2_to_date
and    rn_lr2 = 1
union all
select deptid
      ,lr_from_date as from_date
      ,t1_to_date AS to_date
      ,data1
      ,'' AS data2
from   q1
where  lr_from_date between t1_from_date and t1_to_date
and    rn_lr1 = 1
order by deptid, from_date

SQL Fiddle

This solution is probably the ugliest of the three, and though it only requires one pass over the two tables, it materializes all the data in a temp table so probably requires a fair amount of memory for large data sets.

I’m so glad I posted this problem – I received help in the form of two quite different solutions, which help to show different ways of approaching problems like this. Not only that, but it highlighted the value of test-driven development – instead of groping in the dark to come up with a quick-and-dirty solution (that might or might not work), starting with comprehensive test cases is invaluable for both developing and validating potential solutions.