One insert or multiple inserts?


A colleague asked me a trick* question today which I failed 🙂

* whether it was a “trick” question is probably in the eye of the beholder, though…

“What are the differences, if any, between the following two approaches to inserting multiple rows (assume v1 and v2 have different values):

INSERT INTO mytable (a)
SELECT :v1 FROM DUAL;
INSERT INTO mytable (a)
SELECT :v2 FROM DUAL;

OR:

INSERT INTO mytable (a)
SELECT :v1 FROM DUAL
UNION
SELECT :v2 FROM DUAL;

I quickly answered:

  1. The first approach requires one extra parse;
  2. The second approach requires a Sort Distinct (albeit of only two rows)
  3. A UNION ALL would be better, which would not require a Sort, nor would require the extra parse.

My colleague responded, there’s one very important thing I missed: Triggers! The first approach would execute the BEFORE STATEMENT and AFTER STATEMENT triggers once for each row. The second approach would only execute these triggers once.

What’s sad is that the application we’re working on has row-level logic in the BEFORE/AFTER statement triggers. If we try to optimise the code to insert more than one row in one INSERT statement, the application only runs the row-level logic for the first row inserted. Bad code! Very very bad!


Weird SQL Results

Chris Taylor raised a good question.

Here is the interesting part (to me) – Sometimes it returns 1 row, sometimes it returns more than 1 row, and sometimes it returns no rows. To my mind, “trunc(dbms_random.value(1,500)” should return a random value between 1 & 500 every time and just one value.  Is this weird, or is it just me???

Basically he’s wondering why querying a table on a randomly-chosen unique identifier does not always return exactly 1 row.

At first glance it seems nonsensical and looks like a bug, but really, it’s not – in fact, Oracle is doing exactly the right thing (anecdotally, a colleague has told me that SQL Server gets this particular behaviour wrong).

A simple test case: the inner query generates exactly 100 rows, with the numbers 1 to 100. The outer query then queries this result set, with the intention of picking one of those rows only:

select * from (
  select rownum r from dual connect by level <= 100
)
where r = trunc(dbms_random.value(1,100));

However, what happens here is that Oracle knows that dbms_random.value is not deterministic; and the predicates (WHERE clause) must be evaluated once for each row returned from the FROM clause. Since dbms_random.value is not deterministic, Oracle knows it must re-evaluate it for each row, separately – which means it is comparing a different number for each row returned.

So, for example, Oracle looks at row 1, with the value 1. It generates a random number, e.g. 12, and so the predicate evaluates to FALSE, and the row is not returned. Oracle then looks at row 2, which has the value 2. It generates a random number, e.g. 2, and so the predicate evaluates to TRUE, and the row is returned. It does this for each row until the 100th is evaluated and then the query stops.

If none of the 100 random numbers happen to coincide with the values returned from the table, then the query will return no rows.

Quiz question: what is the theoretical maximum number of rows the above query could ever return?

To fix it so that Oracle only evaluates the selection criteria once, we just need to wrap it in another subquery:

select * from (
  select rownum r from dual connect by level <= 100
)
where r = (select trunc(dbms_random.value(1,100)) from dual);

Truncated Mean in Oracle

A colleague needed to get the average from a set of data, but disregarding the top 25% and the bottom 25%. We didn’t know of any builtin Oracle function that would do this, but a review of the wikipedia page for Average yielded the technical term: truncated (or trimmed) mean. So we searched the Oracle docs and Google for ways to implement this function and didn’t come up with anything very useful. There were some SQL Server scripts which would have required two or three passes over the dataset.

After browsing through the aggregate functions documentation, I hit upon the NTILE function which I’ve used before, and realised that was the answer. The NTILE function takes a set of ordered data and divides it evenly (or as evenly as possible) between a number of buckets. In our case, we wanted to discard the top 25% and bottom 25%, so we simply divide the set into 4 buckets and discard the 1st and the 4th buckets; then take the standard average of the remainder:

SELECT AVG(mystat)
FROM (
      SELECT mystat,
             NTILE(4) OVER (ORDER BY mystat) n
      FROM (SELECT mystat
            FROM mytable)
     )
WHERE n IN (2,3);

The benefit of this query is that it only does one pass over the data, and was easily modified to partition the data set into groups; a count was also added so that the average could be taken over the entire data set for any groups that had less than 4 items.

To get the truncated mean in groups, except for groups with <4 items (for which we’ll report the average over the entire group):

SELECT mycat, AVG(mystat)
FROM (
      SELECT mycat, mystat,
             NTILE(4) OVER (PARTITION BY mycat
                            ORDER BY mystat) n,
             COUNT(1) OVER (PARTITION BY mycat) c 
      FROM (SELECT mycat, mystat
            FROM mytable)
     )
WHERE n IN (2,3) OR c < 4
GROUP BY mycat
ORDER BY mycat;

How to Change the ORDER BY When You Can’t Modify the SQL

Is it possible to change the ORDER BY on a query, if the SQL is defined in a wrapped PL/SQL package? A colleague asked me this interesting question today – and the answer was quite simply, Yes and No.

If the SQL already has an ORDER BY, then the answer is No. Fortunately, in our case, the SQL didn’t have any ORDER BY clause:

SELECT a.addressid
FROM address a, addresstype t
WHERE a.locationid = :locationid
AND a.typeid = t.typeid
AND t.typecode = 'STREET';

The plan currently being picked by the CBO (Oracle 9.2.0.8) is:

NESTED LOOPS
   TABLE ACCESS BY ROWID addresstype
      INDEX RANGE SCAN pkx_addresstype
   INDEX RANGE SCAN ix_address

For background, here are the relevant schema details:

TABLE address
(addressid NUMBER PRIMARY KEY
,locationid NUMBER NOT NULL
,typeid NUMBER NOT NULL --FOREIGN KEY to addresstype
);

INDEX idx_address (locationid, typeid, addressid);

TABLE addresstype
(typeid NUMBER PRIMARY KEY
,typecode VARCHAR2(30) NOT NULL
);

typecode is actually unique, but no constraint has been declared to the database.

Because the first step in the plan (the TABLE ACCESS on addresstype) only ever gets one row (in practice), the index range scan on ix_address causes this query to return the rows in ascending order of addressid.

The process that runs this query gets rows one at a time, does some work to “check” it against some other data, and once it finds a “match”, it stops getting more rows.

This design probably seemed like a good idea at the time, but now the business would like this process changed so that in the (rare) occurrence when more than one row would “match”, the process should favour a more recent address rather than an old one. For our purposes, “recent” may be approximated by assuming that bigger addressids are more recent than smaller addressids.

We can’t change the code without a lengthy and costly call to the application vendor; it would be better if we can change it ourselves. We cannot change the PL/SQL package in which this query is defined; so what can we do? Can we change the order in which rows are returned from this query?

The answer is yes – firstly, the query has no ORDER BY, so the order of rows is affected only by the query plan. If we change the query plan, which we can do with a stored outline, the order of rows can be changed.

So in this case, all we have to do is create a stored outline to add an INDEX_DESC hint:

SELECT /*+INDEX_DESC(a)*/ a.addressid
FROM address a, addresstype t
WHERE a.locationid = :locationid
AND a.typeid = t.typeid
AND t.typecode = 'STREET';

This causes the following plan to be used:

NESTED LOOPS
   TABLE ACCESS BY ROWID addresstype
      INDEX RANGE SCAN pkx_addresstype
   INDEX RANGE SCAN DESCENDING ix_address

With the stored outline, the query now returns the rows in descending order of addressid.

NOTE:
If you want to use this approach, it is highly dependant on being able to find a query plan that produces the ordering that you want. This may mean creating a suitable index designed strategically for this purpose, if one doesn’t exist already. Even creating indexes won’t guarantee that you’ll be able to find a plan that gets the order that you want; generally I’d say this approach will only be feasible for relatively simple queries.


Two questions about stored outlines

Out of a discussion with a colleague at work I had two questions to answer:

  1. Can a stored outline change a query – e.g. the FROM or WHERE clauses?

  2. If a stored outline hints the query to use a function-based index, and a user doesn’t have execute privilege on the function, what happens if the user runs the query?

1. Can a stored outline change a query?

The answer was simple to find – no, stored outlines only comprise hints; if the hint doesn’t make sense for a particular query, the optimiser ignores the hint just as it does if you add the hint directly to the query. For example:


SQL> CONN SYS AS SYSDBA

SQL> ALTER SYSTEM SET USE_STORED_OUTLINES = TRUE;

SQL> GRANT CREATE ANY OUTLINE TO SCOTT;

SQL> CONN SCOTT

SQL> CREATE OUTLINE testoutline1 ON
SELECT * FROM EMP WHERE EMPNO = :1;

SQL> CREATE OUTLINE testoutline2 ON
SELECT * FROM DEPT WHERE DEPTNO = :1;

SQL> CONN SYS AS SYSDBA

–swap the outlines
SQL> UPDATE OUTLN.ol$hints
SET ol_name =
CASE ol_name
WHEN ‘TESTOUTLINE1’ THEN ‘TESTOUTLINE2’
WHEN ‘TESTOUTLINE2’ THEN ‘TESTOUTLINE1’
END
WHERE ol_name IN (‘TESTOUTLINE1′,’TESTOUTLINE2’);

SQL> CONN SCOTT

SQL> SELECT * FROM EMP WHERE EMPNO = :1;

SELECT STATEMENT
TABLE ACCESS BY INDEX ROWID (EMP)
INDEX UNIQUE SCAN (PK_EMP)

2. What if a user doesn’t have execute privilege on a function used in a function-based index, but the stored outline tells them to use the index?

The scenario is where a table has an index on a column and a function. A stored outline could tell the optimiser to use this index, but a particular user doesn’t have execute privilege on the function; will the the optimiser still allow the user to query the table using the index? It should, since you shouldn’t need to execute the function to do a range scan on the first column, ignoring the function-based portion of the index.


SQL> CONN SCOTT

SQL> CREATE TABLE SCOTT.testtable
(c1 NUMBER NOT NULL
,c2 NUMBER NOT NULL
);

SQL> GRANT SELECT ON SCOTT.testtable TO LARRY;

SQL> CREATE OR REPLACE
FUNCTION SCOTT.testfunction (i NUMBER)
RETURN NUMBER DETERMINISTIC IS
BEGIN
RETURN I * 3.14159;
END;

–note: don’t grant execute on testfunction to LARRY

SQL> CREATE INDEX SCOTT.testindex
ON SCOTT.testtable (c1, SCOTT.testfunction(c2));

SQL> CONN LARRY

SQL> SELECT * FROM SCOTT.testtable
WHERE c1 = TO_NUMBER(:1);

SELECT STATEMENT
TABLE ACCESS FULL (TESTTABLE)

SQL> CREATE OUTLINE testoutline3 ON
SELECT * FROM SCOTT.testtable x
WHERE c1 = TO_NUMBER(:1)
AND SCOTT.testfunction(c2) = TO_NUMBER(:2);

SQL> CREATE OUTLINE testoutline4 ON
SELECT * FROM SCOTT.testtable x
WHERE c1 = TO_NUMBER(:1);

SQL> CONN SYS

–swap the outlines
SQL> UPDATE OUTLN.ol$hints
SET ol_name =
CASE ol_name
WHEN ‘TESTOUTLINE3’ THEN ‘TESTOUTLINE4’
WHEN ‘TESTOUTLINE4’ THEN ‘TESTOUTLINE3’
END
WHERE ol_name IN (‘TESTOUTLINE3′,’TESTOUTLINE4’);

SQL> CONN LARRY

SQL> SELECT * FROM SCOTT.testtable
WHERE c1 = TO_NUMBER(:1);

SELECT STATEMENT
TABLE ACCESS BY INDEX ROWID (TESTTABLE)
INDEX RANGE SCAN (TESTINDEX)

The answer is yes – just as Larry could have hinted his query to use the index, and it would have worked just as well (using the non-function-based part of the index), it works just the same if stored outlines are used instead. The use of stored outlines is compatible with indexes on function-based indexes and will not muck around with queries run by users who don’t have all the privileges on the functions used by those indexes.