A Good Bad Example

I learnt something new today (thanks to Steven) about TRIM – in the past I’ve used RTRIM and LTRIM if I only want to trim from the left or right ends of a string, but I was not aware (or had forgotten) that SQL’s TRIM function allows this to be specified.

Oracle’s documentation of TRIM, however, has a somewhat confusing example:

“This example trims leading zeros from the hire date of the employees in the hr schema:

SELECT employee_id,
TO_CHAR(TRIM(LEADING 0 FROM hire_date))
FROM employees
WHERE department_id = 60
ORDER BY employee_id;

EMPLOYEE_ID TO_CHAR(T
----------- ---------
103 3-JAN-90
104 21-MAY-91
105 25-JUN-97
106 5-FEB-98
107 7-FEB-99

As an example of using the TRIM function the above code is adequate; however, why does it call the TO_CHAR function at that point? After all, TO_CHAR is only useful if the datatype of its argument is not a character string – but in this case, it’s a string already – TRIM always returns a string.

Not only does the example perform a useless datatype conversion, it involves an implicit conversion – from date to string. The expression would be much better like this:

TRIM(LEADING 0 FROM TO_CHAR(hire_date))

Better – but still not perfect. Sure, we should probably specify the date format so the code is a bit more robust (what if the session has a default format of ‘MM/DD/YY’?), but we can see yet another implicit data type conversion: “LEADING 0“. The only purpose for TRIM is to remove characters. Sure, you can put in zero if you want, but it’s only going to be converted to a character, ‘0’ anyway – so why not make it obvious to the next coder who comes along:

TRIM(LEADING '0' FROM TO_CHAR(hire_date,'DD/MM/YY'))

There, much better! I think the above expression is much clearer about what it will actually do – i.e. it converts a date to a string of characters, and then removes any leading ‘0’ characters.


Truncated Mean in Oracle

A colleague needed to get the average from a set of data, but disregarding the top 25% and the bottom 25%. We didn’t know of any builtin Oracle function that would do this, but a review of the wikipedia page for Average yielded the technical term: truncated (or trimmed) mean. So we searched the Oracle docs and Google for ways to implement this function and didn’t come up with anything very useful. There were some SQL Server scripts which would have required two or three passes over the dataset.

After browsing through the aggregate functions documentation, I hit upon the NTILE function which I’ve used before, and realised that was the answer. The NTILE function takes a set of ordered data and divides it evenly (or as evenly as possible) between a number of buckets. In our case, we wanted to discard the top 25% and bottom 25%, so we simply divide the set into 4 buckets and discard the 1st and the 4th buckets; then take the standard average of the remainder:

SELECT AVG(mystat)
FROM (
      SELECT mystat,
             NTILE(4) OVER (ORDER BY mystat) n
      FROM (SELECT mystat
            FROM mytable)
     )
WHERE n IN (2,3);

The benefit of this query is that it only does one pass over the data, and was easily modified to partition the data set into groups; a count was also added so that the average could be taken over the entire data set for any groups that had less than 4 items.

To get the truncated mean in groups, except for groups with <4 items (for which we’ll report the average over the entire group):

SELECT mycat, AVG(mystat)
FROM (
      SELECT mycat, mystat,
             NTILE(4) OVER (PARTITION BY mycat
                            ORDER BY mystat) n,
             COUNT(1) OVER (PARTITION BY mycat) cĀ 
      FROM (SELECT mycat, mystat
            FROM mytable)
     )
WHERE n IN (2,3) OR c < 4
GROUP BY mycat
ORDER BY mycat;


A good entry-level interview question

I think this would make a reasonable question for would-be database developers:

What are the differences between the following statements?

UPDATE mytable SET mycolumn = NULL;
ALTER TABLE mytable MODIFY (mycolumn NULL);

If they can’t answer this correctly, they don’t get to stage two.


How to Change the ORDER BY When You Can’t Modify the SQL

Is it possible to change the ORDER BY on a query, if the SQL is defined in a wrapped PL/SQL package? A colleague asked me this interesting question today – and the answer was quite simply, Yes and No.

If the SQL already has an ORDER BY, then the answer is No. Fortunately, in our case, the SQL didn’t have any ORDER BY clause:

SELECT a.addressid
FROM address a, addresstype t
WHERE a.locationid = :locationid
AND a.typeid = t.typeid
AND t.typecode = 'STREET';

The plan currently being picked by the CBO (Oracle 9.2.0.8) is:

NESTED LOOPS
   TABLE ACCESS BY ROWID addresstype
      INDEX RANGE SCAN pkx_addresstype
   INDEX RANGE SCAN ix_address

For background, here are the relevant schema details:

TABLE address
(addressid NUMBER PRIMARY KEY
,locationid NUMBER NOT NULL
,typeid NUMBER NOT NULL --FOREIGN KEY to addresstype
);

INDEX idx_address (locationid, typeid, addressid);

TABLE addresstype
(typeid NUMBER PRIMARY KEY
,typecode VARCHAR2(30) NOT NULL
);

typecode is actually unique, but no constraint has been declared to the database.

Because the first step in the plan (the TABLE ACCESS on addresstype) only ever gets one row (in practice), the index range scan on ix_address causes this query to return the rows in ascending order of addressid.

The process that runs this query gets rows one at a time, does some work to “check” it against some other data, and once it finds a “match”, it stops getting more rows.

This design probably seemed like a good idea at the time, but now the business would like this process changed so that in the (rare) occurrence when more than one row would “match”, the process should favour a more recent address rather than an old one. For our purposes, “recent” may be approximated by assuming that bigger addressids are more recent than smaller addressids.

We can’t change the code without a lengthy and costly call to the application vendor; it would be better if we can change it ourselves. We cannot change the PL/SQL package in which this query is defined; so what can we do? Can we change the order in which rows are returned from this query?

The answer is yes – firstly, the query has no ORDER BY, so the order of rows is affected only by the query plan. If we change the query plan, which we can do with a stored outline, the order of rows can be changed.

So in this case, all we have to do is create a stored outline to add an INDEX_DESC hint:

SELECT /*+INDEX_DESC(a)*/ a.addressid
FROM address a, addresstype t
WHERE a.locationid = :locationid
AND a.typeid = t.typeid
AND t.typecode = 'STREET';

This causes the following plan to be used:

NESTED LOOPS
   TABLE ACCESS BY ROWID addresstype
      INDEX RANGE SCAN pkx_addresstype
   INDEX RANGE SCAN DESCENDING ix_address

With the stored outline, the query now returns the rows in descending order of addressid.

NOTE:
If you want to use this approach, it is highly dependant on being able to find a query plan that produces the ordering that you want. This may mean creating a suitable index designed strategically for this purpose, if one doesn’t exist already. Even creating indexes won’t guarantee that you’ll be able to find a plan that gets the order that you want; generally I’d say this approach will only be feasible for relatively simple queries.


Two questions about stored outlines

Out of a discussion with a colleague at work I had two questions to answer:

  1. Can a stored outline change a query – e.g. the FROM or WHERE clauses?

  2. If a stored outline hints the query to use a function-based index, and a user doesn’t have execute privilege on the function, what happens if the user runs the query?

1. Can a stored outline change a query?

The answer was simple to find – no, stored outlines only comprise hints; if the hint doesn’t make sense for a particular query, the optimiser ignores the hint just as it does if you add the hint directly to the query. For example:


SQL> CONN SYS AS SYSDBA

SQL> ALTER SYSTEM SET USE_STORED_OUTLINES = TRUE;

SQL> GRANT CREATE ANY OUTLINE TO SCOTT;

SQL> CONN SCOTT

SQL> CREATE OUTLINE testoutline1 ON
SELECT * FROM EMP WHERE EMPNO = :1;

SQL> CREATE OUTLINE testoutline2 ON
SELECT * FROM DEPT WHERE DEPTNO = :1;

SQL> CONN SYS AS SYSDBA

–swap the outlines
SQL> UPDATE OUTLN.ol$hints
SET ol_name =
CASE ol_name
WHEN ‘TESTOUTLINE1’ THEN ‘TESTOUTLINE2’
WHEN ‘TESTOUTLINE2’ THEN ‘TESTOUTLINE1’
END
WHERE ol_name IN (‘TESTOUTLINE1′,’TESTOUTLINE2’);

SQL> CONN SCOTT

SQL> SELECT * FROM EMP WHERE EMPNO = :1;

SELECT STATEMENT
TABLE ACCESS BY INDEX ROWID (EMP)
INDEX UNIQUE SCAN (PK_EMP)

2. What if a user doesn’t have execute privilege on a function used in a function-based index, but the stored outline tells them to use the index?

The scenario is where a table has an index on a column and a function. A stored outline could tell the optimiser to use this index, but a particular user doesn’t have execute privilege on the function; will the the optimiser still allow the user to query the table using the index? It should, since you shouldn’t need to execute the function to do a range scan on the first column, ignoring the function-based portion of the index.


SQL> CONN SCOTT

SQL> CREATE TABLE SCOTT.testtable
(c1 NUMBER NOT NULL
,c2 NUMBER NOT NULL
);

SQL> GRANT SELECT ON SCOTT.testtable TO LARRY;

SQL> CREATE OR REPLACE
FUNCTION SCOTT.testfunction (i NUMBER)
RETURN NUMBER DETERMINISTIC IS
BEGIN
RETURN I * 3.14159;
END;

–note: don’t grant execute on testfunction to LARRY

SQL> CREATE INDEX SCOTT.testindex
ON SCOTT.testtable (c1, SCOTT.testfunction(c2));

SQL> CONN LARRY

SQL> SELECT * FROM SCOTT.testtable
WHERE c1 = TO_NUMBER(:1);

SELECT STATEMENT
TABLE ACCESS FULL (TESTTABLE)

SQL> CREATE OUTLINE testoutline3 ON
SELECT * FROM SCOTT.testtable x
WHERE c1 = TO_NUMBER(:1)
AND SCOTT.testfunction(c2) = TO_NUMBER(:2);

SQL> CREATE OUTLINE testoutline4 ON
SELECT * FROM SCOTT.testtable x
WHERE c1 = TO_NUMBER(:1);

SQL> CONN SYS

–swap the outlines
SQL> UPDATE OUTLN.ol$hints
SET ol_name =
CASE ol_name
WHEN ‘TESTOUTLINE3’ THEN ‘TESTOUTLINE4’
WHEN ‘TESTOUTLINE4’ THEN ‘TESTOUTLINE3’
END
WHERE ol_name IN (‘TESTOUTLINE3′,’TESTOUTLINE4’);

SQL> CONN LARRY

SQL> SELECT * FROM SCOTT.testtable
WHERE c1 = TO_NUMBER(:1);

SELECT STATEMENT
TABLE ACCESS BY INDEX ROWID (TESTTABLE)
INDEX RANGE SCAN (TESTINDEX)

The answer is yes – just as Larry could have hinted his query to use the index, and it would have worked just as well (using the non-function-based part of the index), it works just the same if stored outlines are used instead. The use of stored outlines is compatible with indexes on function-based indexes and will not muck around with queries run by users who don’t have all the privileges on the functions used by those indexes.


Create Excel file from SQL query using Python

This will create a file called “emp.xls” with a nicely formatted spreadsheet based on the query specified. The header row is based on the column names, and is frozen. The column widths are set according to the max width of the data.

import cx_Oracle
from table2Excel import write_cursor_to_excel

orcl = cx_Oracle.connect('scott/tiger')
curs = orcl.cursor()
curs.execute("""
  SELECT e.ename "Employee",
         e.job "Job",
         e.hiredate "Hire Date",
         e.sal "Salary",
         e.comm "Commission",
         d.dname "Department",
         (SELECT ename FROM emp WHERE emp.empno = e.mgr) "Manager"
  FROM   emp e, dept d
  WHERE  e.deptno = d.deptno
""")
write_cursor_to_excel(curs, 'emp.xls', 'Employees')

You can get the source for table2Excel from here [table2Excel.py.txt]. Feel free to adapt it to your needs.




Prefix searches, Wildcard characters and LIKE

Want to run a prefix search using the LIKE operator, but your search pattern includes wildcards like _ or %? A common example is searching the v$parameter view for parameters starting with the underscore (_) character. So, why doesn’t this work:

SELECT * FROM v$parameter WHERE name LIKE '_%';

This returns ALL the rows in the view, because the “_” in the pattern matches any single character. There’s all sorts of ways to get around this:

Option 1. Use TRANSLATE to modify the underscores to some other character

SELECT * FROM my_table
WHERE TRANSLATE(my_column, '_', '#') LIKE '#%';

No good if you’re after a general solution, e.g. what if the original string already has some “#”‘s in it? Also, even if the table has an ordinary index on the column the query cannot use it.

Option 2. Use SUBSTR so that we can use an equality condition instead of LIKE

SELECT * FROM my_table
WHERE SUBSTR(my_column,1,1) = '_';

Easy, simple. Except that an ordinary index on the column will still not be used (of course, an appropriate function-based index could be used).

Option 3. Use a regular expression

SELECT * FROM my_table
WHERE REGEXP_LIKE(my_column, '^_');

or

SELECT * FROM my_table
WHERE REGEXP_INSTR(my_column, '^_') > 0

Rather simple, once you know how to write regular expressions. Still can’t use an ordinary index, though (although, again, a (very specific and otherwise generally useless) function-based index could be used).

Option 4. Read the manual and learn something about the LIKE operator

SELECT * FROM my_table
WHERE my_column LIKE '\_%' ESCAPE '\';

Easy, simple; and if there’s an index on the column, the optimiser may use it. No need for any silly function-based indexes or new-fangled regular expressions.