Prefix searches, Wildcard characters and LIKE

Want to run a prefix search using the LIKE operator, but your search pattern includes wildcards like _ or %? A common example is searching the v$parameter view for parameters starting with the underscore (_) character. So, why doesn’t this work:

SELECT * FROM v$parameter WHERE name LIKE '_%';

This returns ALL the rows in the view, because the “_” in the pattern matches any single character. There’s all sorts of ways to get around this:

Option 1. Use TRANSLATE to modify the underscores to some other character

SELECT * FROM my_table
WHERE TRANSLATE(my_column, '_', '#') LIKE '#%';

No good if you’re after a general solution, e.g. what if the original string already has some “#”‘s in it? Also, even if the table has an ordinary index on the column the query cannot use it.

Option 2. Use SUBSTR so that we can use an equality condition instead of LIKE

SELECT * FROM my_table
WHERE SUBSTR(my_column,1,1) = '_';

Easy, simple. Except that an ordinary index on the column will still not be used (of course, an appropriate function-based index could be used).

Option 3. Use a regular expression

SELECT * FROM my_table
WHERE REGEXP_LIKE(my_column, '^_');

or

SELECT * FROM my_table
WHERE REGEXP_INSTR(my_column, '^_') > 0

Rather simple, once you know how to write regular expressions. Still can’t use an ordinary index, though (although, again, a (very specific and otherwise generally useless) function-based index could be used).

Option 4. Read the manual and learn something about the LIKE operator

SELECT * FROM my_table
WHERE my_column LIKE '\_%' ESCAPE '\';

Easy, simple; and if there’s an index on the column, the optimiser may use it. No need for any silly function-based indexes or new-fangled regular expressions.


Add a “Who’s Online Now” box to your APEX app

Something to file under “pointless fun”…

Allow your users to feel like they’re part of a community by letting them know who else is using the app at the same time.

  1. Open APEX Application Builder, and open the page you want to add this to.
  2. Click the Create Region icon.
  3. Choose Report, then SQL Report.
  4. Enter a title, e.g. “Who’s Online Now”. Click Next.
  5. Copy the following for the SQL Query (sorry about the poor formatting):
    SELECT apex_user || ' (' || minutes_ago
           || DECODE(minutes_ago,1,' min ago)',' mins ago)')
    FROM (
      SELECT DISTINCT
             apex_user,
             FIRST_VALUE(TRUNC(seconds_ago/60))
             OVER (PARTITION BY apex_user ORDER BY seconds_ago)
             AS minutes_ago
      FROM apex_workspace_activity_log
      WHERE application_id = :APP_ID
      AND seconds_ago < 3600
      AND apex_user NOT IN (:APP_USER, 'nobody')
      ORDER BY 2);
    
  6. Click Next. Click Create Region.

You can customise the region as much as you like. For example, I use:

  • Template (under Region Definition) = “Sidebar Region”
  • Report Template (under Report Attributes) = “template: 16. One Column Unordered List”
  • Pagination Scheme = “- No Pagination Selected -“
  • Headings Type = “None”

The query is based on the APEX view apex_workspace_activity_log which is supplied with the product. It looks at all session activity within the last hour by users other than the current user, and returns a list showing their most recent activity.

The only slight problem is that it doesn’t detect when someone logs out – they’ll stay in the report for up to an hour.


WITH With an IN; or, A Reason to Refactor

A work colleague needed to make a change to a report and came up against a brick wall. He knew what he wanted to express in SQL, but Oracle wouldn’t accept his syntax.

The original query used a WITH clause like this (note, I’ve removed a great deal of irrelevant detail here, the actual query had a lot of other stuff going on, but this will do to illustrate this point):

WITH q AS
(SELECT expensive_function1(:id) idpart1
,expensive_function2(:id) idpart2
FROM DUAL)
SELECT t.*
FROM q, some_table t
WHERE q.idpart1 = t.idpart1
AND q.idpart2 = t.idpart2;

This works well because the two expensive functions are run only once in the query, and the results are used to probe some_table on its compound key.

The requirements had changed, however, and my colleague needed to change it so that instead of calling the expensive functions, it now got multiple keys from another table. His first cut looked like this:

SELECT t.*
FROM some_table t
WHERE (t.idpart1, t.idpart2) IN
(SELECT idpart1, idpart2
FROM driving_table
WHERE id = :id);

This worked fine, but my colleague felt that the WITH clause should be retained, e.g.:

WITH q AS
(SELECT idpart1, idpart2
FROM driving_table
WHERE id = :id)
SELECT t.*
FROM some_table t
WHERE (t.idpart1, t.idpart2) IN q;

However, Oracle doesn’t like this and raises ORA-00920: invalid relational operator when the statement is parsed.

To my mind this alternative formulation was not required, but I was interested to know what this error message meant, and why can’t we use a WITH subquery as the target for an IN statement?

As it is, however, the Oracle documentation explains why this formulation is not allowed – look up the “SELECT” command in the SQL Reference 10g Release 2 (10.2), search for “WITH”, and under “subquery_factoring_clause” we read:

“Restrictions on Subquery Factoring: This clause is subject to the following restrictions:

In a compound query with set operators, you cannot use the query_name for any of the component queries, but you can use the query_name in the FROM clause of any of the component queries.”

In other words, we can refer to a WITH subquery in a FROM clause but not as part of a set operator like IN.

In the case of my colleague, I recommended he use this much simpler formulation, which works just fine for this report:

SELECT t.*
FROM some_table t, driving_table d
WHERE t.idpart1 = q.idpart1
AND t.idpart2 = q.idpart2;

This shows how, when maintaining existing code, sometimes you have to go back to basics and consider whether the change to the requirements (in this case, a seemingly minor change) means that the code can be refactored to give an appreciable benefit. Caveat: this is not always the case: making large changes to existing code always carries the risk of introducing more defects.


Asia Down Under

Um, could someone either
(a) tell me when Australia became part of Asia? or
(b) send Oracle University to Geography 101… 🙂

“First Time in Asia! … Melbourne… Sydney… Brisbane”


Learn more about Oracle by examining obscure SQL

How many features can you use in one SQL query to solve a problem? (that is, how many necessary features do you need without deliberately obfuscating your code)

In this rather educational example we see (1) XML, (2) UTL_RAW.reverse and UTL_RAW.cast_to_varchar2, and (3) the MODEL clause: “How to convert a number from any base to any base in SQL” (Frank Zhou).

Unfortunately the query crashes with ORA-03113 end-of-file on communication channel on my database (10.2.0.3), but I won’t hold that against him 🙂

Personally, I’d implement this using a PL/SQL function, but that’s just me…

Earlier Frank showed a narrower example (Octal/Hex to Binary) using a similar technique, except using a hierarchical query and sys_connect_by_path instead of the MODEL clause.


My APEX application asks users to log in twice

I had this problem with an APEX application I’m building, and finally found the cause this morning, so I thought I’d share it.

This particular application has some pages which are only available to authenticated users, and some pages which are visible to everyone. One nice thing about APEX is that it automatically redirects users to the Login screen if they try to navigate to a protected page.

After authentication, the user doesn’t have to login again – they can now see all pages of the application that I want them to see. This used to work fine.

Recently I noticed that sometimes I’d Login with my username and password, click on a Tab, and it would ask me to Login again. In these instances, it’d only ask me to Login just the second time – after that, it would be fine. I wrote it off as a random glitch on my home-grown server. It seemed to be random, and after a while I noticed it was happening once every day. I looked all through my application, trying to find any links that didn’t pass the &SESSION. through, but I couldn’t find any such problems. I looked at some other applications on the same server – no problems there, it was just this one application.

Just this morning I went in, and happened to notice something not quite right. Normally, when I go into an application, the URL looks something like this:

http://www.xyz.com/apex/f?p=100:1:318727495645403::NO

The site should generate the long numeric Session ID automatically. However, I noticed my URL looked like this:

http://www.xyz.com/apex/f?p=100:1:0::NO

The Session ID was zero. This is a relatively new feature of APEX which I use for my fully-public applications (i.e. ones which require no authentication), where no Session ID is required – it means users can bookmark individual pages without having a long Session ID embedded in the URL.

The cause? When I updated my index page of APEX applications, I copied another entry without thinking, and so included the “0” for the Session ID. So when I first logged in, it gave me a new session, but somewhere internally APEX still had my Session ID = 0, requiring me to Login again. After this, the internal reference to my session was updated. I don’t know if this is expected behaviour or a bug in APEX.

The fix? Remove the 0 from the initial link (e.g. now it looks like “http://www.xyz.com/apex/f?p=100:1”) – zero session IDs are only appropriate for applications that require no login at all anyway.


TOO_MANY_ROWS side effect

I used to assume that whenever a TOO_MANY_ROWS exception is raised, the target bind variables would be left untouched. Until today I’ve never written any code that relies on the bind variables being in any particular state when a TMR exception is raised, so was surprised.

For example, given the code below, I would expect the dbms_output to indicate that v is null:

CREATE PROCEDURE proc (v OUT NUMBER) IS
BEGIN
   SELECT 1 INTO v FROM all_objects;
EXCEPTION
   WHEN TOO_MANY_ROWS THEN
      dbms_output.put_line
         ('TOO MANY ROWS: v='
          || v);
END
/
 

DECLARE
   v NUMBER;
BEGIN
   proc(v);
   dbms_output.put_line('AFTER: v=' || v);
END
/

TOO MANY ROWS: v=1
AFTER: v=1

What appears to happen is that the out bind variables will be assigned values from the first row returned from the query; then when a second row is found, the TOO_MANY_ROWS exception is raised.

According to the documentation (emphasis added):

“By default, a SELECT INTO statement must return only one row. Otherwise, PL/SQL raises the predefined exception TOO_MANY_ROWS and the values of the variables in the INTO clause are undefined. Make sure your WHERE clause is specific enough to only match one row.”

(Oracle Database PL/SQL User’s Guide and Reference (10gR2): SELECT INTO Statement)

So it appears my original stance (don’t assume anything about the variables’ state after TOO_MANY_ROWS is raised) was correct. Lesson learned: beware of performing a SELECT INTO directly on the OUT parameters of your procedure!


LIKE with wildcard (%) at start CAN use an index

A tester for our client raised a problem where searches like ‘ABC%’ would perform satisfactorily, but searches like ‘%ABC’ would time out. Of course, the reason is that the index on the column cannot be used if a wildcard is at the start of the string, so a full table scan is always performed.

I think we’re going to accept the slow response for now, but a possible solution that we might consider in future is to use a function-based index on the REVERSE() function. Its drawback is that it requires modifying the code.

First, some test data:

CREATE TABLE jka AS
SELECT ROWNUM id, dbms_random.string('x',10) v
FROM dual CONNECT BY LEVEL <= 10000;

Now, create an ordinary index:

CREATE INDEX jka_normal ON jka (v);

BEGIN
  dbms_stats.gather_table_stats
    (USER
    ,'JKA'
    ,estimate_percent=>100
    ,cascade=>TRUE);
END;

This query can use a range scan on jka_normal:

SELECT * FROM jka WHERE v LIKE 'ABC%';

But this query will use a full table scan (can’t use the index):

SELECT * FROM jka WHERE v LIKE '%ABC';

Now, create a function-based index (not to be confused with a REVERSE INDEX):

CREATE INDEX jka_reverse ON jka(REVERSE(v));

BEGIN
  dbms_stats.gather_table_stats
    (USER
    ,'JKA'
    ,estimate_percent=>100
    ,cascade=>TRUE);
END;

This query can use a range scan on jka_reverse:

SELECT * FROM jka WHERE REVERSE(v) LIKE REVERSE('%ABC');

Edit: looks like Richard Foote beat me to it, in a discussion of reverse indexes (which unfortunately don’t contribute anything useful to this particular problem) – I did a quick search on Google without results, but Foote’s article must not have been indexed yet or too far down the list.


Will Oracle use my regexp function-based index?

It was mentioned at the AUSOUG conference by one of the speakers that he couldn’t get the database to use a function-based index based on the regexp functions. I thought this was a little strange so decided to try for myself.


SQL> select * from v$version;
Oracle Database 10g Express Edition Release 10.2.0.1.0

Create a test table and gather stats on it:

SQL> create table testtable as
select rownum rn,
dbms_random.string(‘a’,10) string10,
rpad(‘x’,2000,’x’) padding
from all_objects
where rownum <= 5000;

SQL> exec dbms_stats.gather_table_stats(user,
‘TESTTABLE’, cascade=>TRUE);

This is our test query using regexp_instr:

SQL> explain plan for
select string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 243 | 475K| 465
* 1 | TABLE ACCESS FULL| TESTTABLE | 243 | 475K| 465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)
– dynamic sampling used for this statement

How many rows are actually returned by this query?

SQL> select string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

STRING10 REGEXP_INSTR(STRING10,’XE’)
———- —————————
fwXEKwoDhG 3
rSmdOXEkeu 6
WXEbrrjXcW 2
qWEThNXEBO 7
XEQtOwjOCW 1
MRXEoAicUQ 3
IpECtZbjXE 9
LdmXjyePXE 9
gXEaiGrjSX 2
PoqtvdGcXE 9
ZunAgePXXE 9
evXEFhBpzX 3
ZxkXstXEwJ 7
yVpjHzXECY 7
FkaPMpXEgR 7
RuXUnXEQSO 6
OooRCjXXEK 8
XzceiWPXEr 8
XEZlpdNMhG 1
jJjzfXEdDw 6
CowyXEuHDm 5

21 rows selected

Hmmm… there should be a better way.
Let’s try a simple function-based index to speed things up:

SQL> create index idx_regexp on testtable
(regexp_instr(string10, ‘XE’));

Will this query use the index?

SQL> explain plan for
select string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 4444 | 48884 | 465
* 1 | TABLE ACCESS FULL| TESTTABLE | 4444 | 48884 | 465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)

Why not? The calculated cost with the index was higher than with a full table scan.

SQL> explain plan for
select /*+ index(testtable idx_regexp) */ string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 4444 | 48884 | 1512
1 | TABLE ACCESS BY INDEX ROWID| TESTTABLE | 4444 | 48884 | 1512
* 2 | INDEX RANGE SCAN | IDX_REGEXP | 4444 | | 10

2 – access( REGEXP_INSTR (“STRING10”,’XE’)>0)

Why was the cost higher? Let’s look at what would be in that index…

SQL> select regexp_instr(string10, ‘XE’) from testtable;

REGEXP_INSTR(STRING10,’XE’)
—————————
0
0
0
0
0
1
0
0
0
0
… (lots of zeroes and the occasional positive integer) …
0
0
0

5000 rows selected

The index has a row for each block in the table. No histogram on it, so it doesn’t know that most of the index is zeroes.
What if we just want stuff from the index?

SQL> explain plan for
select regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 4444 | 48884 | 4
* 1 | INDEX FAST FULL SCAN| IDX_REGEXP | 4444 | 48884 | 4

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)

Yes, that uses the index. So how do we get the rest of the data from the table? Let’s try something else.

SQL> drop index idx_regexp;

We can take advantage of the fact that NULLS are not stored in an index by converting any zeroes (i.e. the regular expression didn’t match) to NULL:

SQL> create index idx_regexp_better on testtable
(CASE WHEN regexp_instr(string10, ‘XE’) > 0
THEN regexp_instr(string10, ‘XE’)
ELSE NULL END);

SQL> exec dbms_stats.gather_table_stats(user, ‘TESTTABLE’, cascade=>TRUE);

Will our query use the new improved index?

SQL> explain plan for
select string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 250 | 2750 | 465
* 1 | TABLE ACCESS FULL| TESTTABLE | 250 | 2750 | 465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)

No. Why? Because it can’t use the index, even if we try to force it with a hint. The optimiser doesn’t know it can modify my expression into the one that was used to build the index.

SQL> explain plan for
select /*+ index(testtable idx_regexp_better)*/
string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 250 | 2750 | 465
* 1 | TABLE ACCESS FULL| TESTTABLE | 250 | 2750 | 465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)

What if we copy the expression from the index into the query? Yes, that’s better.

SQL> explain plan for
select string10, regexp_instr(string10, ‘XE’)
from testtable
where CASE WHEN regexp_instr(string10, ‘XE’) > 0
THEN regexp_instr(string10, ‘XE’)
ELSE NULL END > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 21 | 231 | 22
1 | TABLE ACCESS BY INDEX ROWID| TESTTABLE | 21 | 231 | 22
* 2 | INDEX RANGE SCAN | IDX_REGEXP_BETTER | 21 | | 1

2 – access(CASE WHEN REGEXP_INSTR (“STRING10”,’XE’)>0 THEN REGEXP_INSTR(“STRING10”,’XE’) ELSE NULL END >0)

Brilliant!


Easy way to crash a Oracle 10.2 session

This query crashes the session when it’s run or parsed.
I haven’t checked if this appears in other versions or not.


SQL> conn scott/tiger@orcl
Connected.

SQL> select * from v$version;
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bi
PL/SQL Release 10.2.0.3.0 – Production
CORE 10.2.0.3.0 Production
TNS for Solaris: Version 10.2.0.3.0 – Production
NLSRTL Version 10.2.0.3.0 – Production

SQL> explain plan for with q as
2 (select ‘x’ d from dual
3 union all
4 select ‘y’ from dual)
5 select d from q
6 union all
7 select d from q
8 order by dbms_random.value;
explain plan for with q as (select ‘x’ d from dual
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

The crashes happened when I added the “order by dbms_random.value”. The problem seems to be in the optimiser since I get the same results whether I run the query or just explain it. Possibly something to do with the optimiser wrongly assuming the random function is deterministic?

There’s a simple workaround, so I’m not too worried. I just change it to this and it works fine:


with q as
(select ‘x’ d from dual
union all
select ‘y’ from dual)
select d, dbms_random.value o from q
union all
select d, dbms_random.value o from q
order by o;