A good entry-level interview question

I think this would make a reasonable question for would-be database developers:

What are the differences between the following statements?

UPDATE mytable SET mycolumn = NULL;
ALTER TABLE mytable MODIFY (mycolumn NULL);

If they can’t answer this correctly, they don’t get to stage two.


How to Change the ORDER BY When You Can’t Modify the SQL

Is it possible to change the ORDER BY on a query, if the SQL is defined in a wrapped PL/SQL package? A colleague asked me this interesting question today – and the answer was quite simply, Yes and No.

If the SQL already has an ORDER BY, then the answer is No. Fortunately, in our case, the SQL didn’t have any ORDER BY clause:

SELECT a.addressid
FROM address a, addresstype t
WHERE a.locationid = :locationid
AND a.typeid = t.typeid
AND t.typecode = 'STREET';

The plan currently being picked by the CBO (Oracle 9.2.0.8) is:

NESTED LOOPS
   TABLE ACCESS BY ROWID addresstype
      INDEX RANGE SCAN pkx_addresstype
   INDEX RANGE SCAN ix_address

For background, here are the relevant schema details:

TABLE address
(addressid NUMBER PRIMARY KEY
,locationid NUMBER NOT NULL
,typeid NUMBER NOT NULL --FOREIGN KEY to addresstype
);

INDEX idx_address (locationid, typeid, addressid);

TABLE addresstype
(typeid NUMBER PRIMARY KEY
,typecode VARCHAR2(30) NOT NULL
);

typecode is actually unique, but no constraint has been declared to the database.

Because the first step in the plan (the TABLE ACCESS on addresstype) only ever gets one row (in practice), the index range scan on ix_address causes this query to return the rows in ascending order of addressid.

The process that runs this query gets rows one at a time, does some work to “check” it against some other data, and once it finds a “match”, it stops getting more rows.

This design probably seemed like a good idea at the time, but now the business would like this process changed so that in the (rare) occurrence when more than one row would “match”, the process should favour a more recent address rather than an old one. For our purposes, “recent” may be approximated by assuming that bigger addressids are more recent than smaller addressids.

We can’t change the code without a lengthy and costly call to the application vendor; it would be better if we can change it ourselves. We cannot change the PL/SQL package in which this query is defined; so what can we do? Can we change the order in which rows are returned from this query?

The answer is yes – firstly, the query has no ORDER BY, so the order of rows is affected only by the query plan. If we change the query plan, which we can do with a stored outline, the order of rows can be changed.

So in this case, all we have to do is create a stored outline to add an INDEX_DESC hint:

SELECT /*+INDEX_DESC(a)*/ a.addressid
FROM address a, addresstype t
WHERE a.locationid = :locationid
AND a.typeid = t.typeid
AND t.typecode = 'STREET';

This causes the following plan to be used:

NESTED LOOPS
   TABLE ACCESS BY ROWID addresstype
      INDEX RANGE SCAN pkx_addresstype
   INDEX RANGE SCAN DESCENDING ix_address

With the stored outline, the query now returns the rows in descending order of addressid.

NOTE:
If you want to use this approach, it is highly dependant on being able to find a query plan that produces the ordering that you want. This may mean creating a suitable index designed strategically for this purpose, if one doesn’t exist already. Even creating indexes won’t guarantee that you’ll be able to find a plan that gets the order that you want; generally I’d say this approach will only be feasible for relatively simple queries.


Two questions about stored outlines

Out of a discussion with a colleague at work I had two questions to answer:

  1. Can a stored outline change a query – e.g. the FROM or WHERE clauses?

  2. If a stored outline hints the query to use a function-based index, and a user doesn’t have execute privilege on the function, what happens if the user runs the query?

1. Can a stored outline change a query?

The answer was simple to find – no, stored outlines only comprise hints; if the hint doesn’t make sense for a particular query, the optimiser ignores the hint just as it does if you add the hint directly to the query. For example:


SQL> CONN SYS AS SYSDBA

SQL> ALTER SYSTEM SET USE_STORED_OUTLINES = TRUE;

SQL> GRANT CREATE ANY OUTLINE TO SCOTT;

SQL> CONN SCOTT

SQL> CREATE OUTLINE testoutline1 ON
SELECT * FROM EMP WHERE EMPNO = :1;

SQL> CREATE OUTLINE testoutline2 ON
SELECT * FROM DEPT WHERE DEPTNO = :1;

SQL> CONN SYS AS SYSDBA

–swap the outlines
SQL> UPDATE OUTLN.ol$hints
SET ol_name =
CASE ol_name
WHEN ‘TESTOUTLINE1’ THEN ‘TESTOUTLINE2’
WHEN ‘TESTOUTLINE2’ THEN ‘TESTOUTLINE1’
END
WHERE ol_name IN (‘TESTOUTLINE1′,’TESTOUTLINE2’);

SQL> CONN SCOTT

SQL> SELECT * FROM EMP WHERE EMPNO = :1;

SELECT STATEMENT
TABLE ACCESS BY INDEX ROWID (EMP)
INDEX UNIQUE SCAN (PK_EMP)

2. What if a user doesn’t have execute privilege on a function used in a function-based index, but the stored outline tells them to use the index?

The scenario is where a table has an index on a column and a function. A stored outline could tell the optimiser to use this index, but a particular user doesn’t have execute privilege on the function; will the the optimiser still allow the user to query the table using the index? It should, since you shouldn’t need to execute the function to do a range scan on the first column, ignoring the function-based portion of the index.


SQL> CONN SCOTT

SQL> CREATE TABLE SCOTT.testtable
(c1 NUMBER NOT NULL
,c2 NUMBER NOT NULL
);

SQL> GRANT SELECT ON SCOTT.testtable TO LARRY;

SQL> CREATE OR REPLACE
FUNCTION SCOTT.testfunction (i NUMBER)
RETURN NUMBER DETERMINISTIC IS
BEGIN
RETURN I * 3.14159;
END;

–note: don’t grant execute on testfunction to LARRY

SQL> CREATE INDEX SCOTT.testindex
ON SCOTT.testtable (c1, SCOTT.testfunction(c2));

SQL> CONN LARRY

SQL> SELECT * FROM SCOTT.testtable
WHERE c1 = TO_NUMBER(:1);

SELECT STATEMENT
TABLE ACCESS FULL (TESTTABLE)

SQL> CREATE OUTLINE testoutline3 ON
SELECT * FROM SCOTT.testtable x
WHERE c1 = TO_NUMBER(:1)
AND SCOTT.testfunction(c2) = TO_NUMBER(:2);

SQL> CREATE OUTLINE testoutline4 ON
SELECT * FROM SCOTT.testtable x
WHERE c1 = TO_NUMBER(:1);

SQL> CONN SYS

–swap the outlines
SQL> UPDATE OUTLN.ol$hints
SET ol_name =
CASE ol_name
WHEN ‘TESTOUTLINE3’ THEN ‘TESTOUTLINE4’
WHEN ‘TESTOUTLINE4’ THEN ‘TESTOUTLINE3’
END
WHERE ol_name IN (‘TESTOUTLINE3′,’TESTOUTLINE4’);

SQL> CONN LARRY

SQL> SELECT * FROM SCOTT.testtable
WHERE c1 = TO_NUMBER(:1);

SELECT STATEMENT
TABLE ACCESS BY INDEX ROWID (TESTTABLE)
INDEX RANGE SCAN (TESTINDEX)

The answer is yes – just as Larry could have hinted his query to use the index, and it would have worked just as well (using the non-function-based part of the index), it works just the same if stored outlines are used instead. The use of stored outlines is compatible with indexes on function-based indexes and will not muck around with queries run by users who don’t have all the privileges on the functions used by those indexes.


Create Excel file from SQL query using Python

This will create a file called “emp.xls” with a nicely formatted spreadsheet based on the query specified. The header row is based on the column names, and is frozen. The column widths are set according to the max width of the data.

import cx_Oracle
from table2Excel import write_cursor_to_excel

orcl = cx_Oracle.connect('scott/tiger')
curs = orcl.cursor()
curs.execute("""
  SELECT e.ename "Employee",
         e.job "Job",
         e.hiredate "Hire Date",
         e.sal "Salary",
         e.comm "Commission",
         d.dname "Department",
         (SELECT ename FROM emp WHERE emp.empno = e.mgr) "Manager"
  FROM   emp e, dept d
  WHERE  e.deptno = d.deptno
""")
write_cursor_to_excel(curs, 'emp.xls', 'Employees')

You can get the source for table2Excel from here [table2Excel.py.txt]. Feel free to adapt it to your needs.



I Am Wizard: A Short Story

I was working happily on my laptop in the living room, kids playing on the rug, birds were singing, et cetera. All of a sudden, a blood-curdling scream erupts from the office. It was my wife.
“Jeff! Is there an “undo” function in the roster program??!?!?”
Uh oh.

(A very quick bit of background: the “roster program” is a little Apex application I wrote so my wife can manage a roster of over 100 volunteers at our local church, assigning them to a range of duties, while ensuring that they are available, are willing to perform the duty, and that their assignments do not conflict with other assignments (i.e. they normally can’t do two jobs at once).)

I open the program and look at the roster. It’s almost completely blank. Only half an hour previously it was almost completely filled. Not looking good. Rosalie runs into the room, beside herself – with good reason, even with the computer helping it takes a lot of effort to assign all the jobs.
Her: “Didn’t you build an “undo” feature?”
Me: “No – I didn’t get around to it…” While querying the database directly and finding that yes, indeed, all the rows have nothing but NULLs, I’m preparing to console her and offer to help to rebuild it. “Do you remember what was happening just before it all disappeared?”
Her: “I selected all the dates, selected just the “Helper” jobs, then clicked the “Clear Dates” button.”
Me: “Ohhhhhhhhhh……” Disaster. I now explain that the “Clear Dates” button is intended to clear all the assignments for the dates selected, not just the ones showing on the screen. Plus, she’d selected all the dates, so it went off and merrily cleared every single assignment from the roster. “Did you happen to email any spreadsheets to anyone?” I ask in vain hope.
Her: “Yes, but only for a few jobs. I guess I can put those back in and start the rest from scratch.” says my poor wife, trudging away knowing she’ll be doing this for the next five hours. Instead of cooking dinner. This is getting worse by the minute!
Me: “Hang on! I have an idea – leave it with me.” I say, thinking, “I hope that the rollback segment is big enough…”
I run this query:

select * from roster_dates2
as of timestamp systimestamp - 0.1;

With this result:

ORA-01555: snapshot too old: rollback segment number 3
with name "_SYSSMU3$" too small

Ok, maybe a shorter time difference:

select * from roster_dates2
as of timestamp systimestamp - 0.01;

Like magic, all the roster assignments that had been NULL are showing as NOT NULL. Brilliant! So now some UPDATE wizardry…

update roster_dates2
set    (vol_id_worship_am, vol_id_worship_pm, ...)
= (select vol_id_worship_am, vol_id_worship_pm, ...
   from   roster_dates2 as of timestamp systimestamp - 0.01 x
   where  x.roster_date = roster_dates2.roster_date)
where roster_date between to_date('05-APR-2009','DD-MON-YYYY')
and add_months(sysdate, +12);

A quick query to check it hasn’t done anything drastically wrong, then commit.
Me: “Rosalie, do you want to hit the Refresh button?”
Silence.
Then, fast steps.
A big smile followed closely by my wife bursts into the room and gives me a big kiss.
Me: “Am I a wizard?”
Her: “Yes, darling, you are a wizard.”

I add some additional code to the start of the “Clear Dates” button:

RAISE_APPLICATION_ERROR(-20000,'Sorry, this function
has been disabled.');

Life is good.


Prefix searches, Wildcard characters and LIKE

Want to run a prefix search using the LIKE operator, but your search pattern includes wildcards like _ or %? A common example is searching the v$parameter view for parameters starting with the underscore (_) character. So, why doesn’t this work:

SELECT * FROM v$parameter WHERE name LIKE '_%';

This returns ALL the rows in the view, because the “_” in the pattern matches any single character. There’s all sorts of ways to get around this:

Option 1. Use TRANSLATE to modify the underscores to some other character

SELECT * FROM my_table
WHERE TRANSLATE(my_column, '_', '#') LIKE '#%';

No good if you’re after a general solution, e.g. what if the original string already has some “#”‘s in it? Also, even if the table has an ordinary index on the column the query cannot use it.

Option 2. Use SUBSTR so that we can use an equality condition instead of LIKE

SELECT * FROM my_table
WHERE SUBSTR(my_column,1,1) = '_';

Easy, simple. Except that an ordinary index on the column will still not be used (of course, an appropriate function-based index could be used).

Option 3. Use a regular expression

SELECT * FROM my_table
WHERE REGEXP_LIKE(my_column, '^_');

or

SELECT * FROM my_table
WHERE REGEXP_INSTR(my_column, '^_') > 0

Rather simple, once you know how to write regular expressions. Still can’t use an ordinary index, though (although, again, a (very specific and otherwise generally useless) function-based index could be used).

Option 4. Read the manual and learn something about the LIKE operator

SELECT * FROM my_table
WHERE my_column LIKE '\_%' ESCAPE '\';

Easy, simple; and if there’s an index on the column, the optimiser may use it. No need for any silly function-based indexes or new-fangled regular expressions.


WITH With an IN; or, A Reason to Refactor

A work colleague needed to make a change to a report and came up against a brick wall. He knew what he wanted to express in SQL, but Oracle wouldn’t accept his syntax.

The original query used a WITH clause like this (note, I’ve removed a great deal of irrelevant detail here, the actual query had a lot of other stuff going on, but this will do to illustrate this point):

WITH q AS
(SELECT expensive_function1(:id) idpart1
,expensive_function2(:id) idpart2
FROM DUAL)
SELECT t.*
FROM q, some_table t
WHERE q.idpart1 = t.idpart1
AND q.idpart2 = t.idpart2;

This works well because the two expensive functions are run only once in the query, and the results are used to probe some_table on its compound key.

The requirements had changed, however, and my colleague needed to change it so that instead of calling the expensive functions, it now got multiple keys from another table. His first cut looked like this:

SELECT t.*
FROM some_table t
WHERE (t.idpart1, t.idpart2) IN
(SELECT idpart1, idpart2
FROM driving_table
WHERE id = :id);

This worked fine, but my colleague felt that the WITH clause should be retained, e.g.:

WITH q AS
(SELECT idpart1, idpart2
FROM driving_table
WHERE id = :id)
SELECT t.*
FROM some_table t
WHERE (t.idpart1, t.idpart2) IN q;

However, Oracle doesn’t like this and raises ORA-00920: invalid relational operator when the statement is parsed.

To my mind this alternative formulation was not required, but I was interested to know what this error message meant, and why can’t we use a WITH subquery as the target for an IN statement?

As it is, however, the Oracle documentation explains why this formulation is not allowed – look up the “SELECT” command in the SQL Reference 10g Release 2 (10.2), search for “WITH”, and under “subquery_factoring_clause” we read:

“Restrictions on Subquery Factoring: This clause is subject to the following restrictions:

In a compound query with set operators, you cannot use the query_name for any of the component queries, but you can use the query_name in the FROM clause of any of the component queries.”

In other words, we can refer to a WITH subquery in a FROM clause but not as part of a set operator like IN.

In the case of my colleague, I recommended he use this much simpler formulation, which works just fine for this report:

SELECT t.*
FROM some_table t, driving_table d
WHERE t.idpart1 = q.idpart1
AND t.idpart2 = q.idpart2;

This shows how, when maintaining existing code, sometimes you have to go back to basics and consider whether the change to the requirements (in this case, a seemingly minor change) means that the code can be refactored to give an appreciable benefit. Caveat: this is not always the case: making large changes to existing code always carries the risk of introducing more defects.


Learn more about Oracle by examining obscure SQL

How many features can you use in one SQL query to solve a problem? (that is, how many necessary features do you need without deliberately obfuscating your code)

In this rather educational example we see (1) XML, (2) UTL_RAW.reverse and UTL_RAW.cast_to_varchar2, and (3) the MODEL clause: “How to convert a number from any base to any base in SQL” (Frank Zhou).

Unfortunately the query crashes with ORA-03113 end-of-file on communication channel on my database (10.2.0.3), but I won’t hold that against him 🙂

Personally, I’d implement this using a PL/SQL function, but that’s just me…

Earlier Frank showed a narrower example (Octal/Hex to Binary) using a similar technique, except using a hierarchical query and sys_connect_by_path instead of the MODEL clause.