Will Oracle use my regexp function-based index?

It was mentioned at the AUSOUG conference by one of the speakers that he couldn’t get the database to use a function-based index based on the regexp functions. I thought this was a little strange so decided to try for myself.


SQL> select * from v$version;
Oracle Database 10g Express Edition Release 10.2.0.1.0

Create a test table and gather stats on it:

SQL> create table testtable as
select rownum rn,
dbms_random.string(‘a’,10) string10,
rpad(‘x’,2000,’x’) padding
from all_objects
where rownum <= 5000;

SQL> exec dbms_stats.gather_table_stats(user,
‘TESTTABLE’, cascade=>TRUE);

This is our test query using regexp_instr:

SQL> explain plan for
select string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 243 | 475K| 465
* 1 | TABLE ACCESS FULL| TESTTABLE | 243 | 475K| 465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)
– dynamic sampling used for this statement

How many rows are actually returned by this query?

SQL> select string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

STRING10 REGEXP_INSTR(STRING10,’XE’)
———- —————————
fwXEKwoDhG 3
rSmdOXEkeu 6
WXEbrrjXcW 2
qWEThNXEBO 7
XEQtOwjOCW 1
MRXEoAicUQ 3
IpECtZbjXE 9
LdmXjyePXE 9
gXEaiGrjSX 2
PoqtvdGcXE 9
ZunAgePXXE 9
evXEFhBpzX 3
ZxkXstXEwJ 7
yVpjHzXECY 7
FkaPMpXEgR 7
RuXUnXEQSO 6
OooRCjXXEK 8
XzceiWPXEr 8
XEZlpdNMhG 1
jJjzfXEdDw 6
CowyXEuHDm 5

21 rows selected

Hmmm… there should be a better way.
Let’s try a simple function-based index to speed things up:

SQL> create index idx_regexp on testtable
(regexp_instr(string10, ‘XE’));

Will this query use the index?

SQL> explain plan for
select string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 4444 | 48884 | 465
* 1 | TABLE ACCESS FULL| TESTTABLE | 4444 | 48884 | 465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)

Why not? The calculated cost with the index was higher than with a full table scan.

SQL> explain plan for
select /*+ index(testtable idx_regexp) */ string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 4444 | 48884 | 1512
1 | TABLE ACCESS BY INDEX ROWID| TESTTABLE | 4444 | 48884 | 1512
* 2 | INDEX RANGE SCAN | IDX_REGEXP | 4444 | | 10

2 – access( REGEXP_INSTR (“STRING10”,’XE’)>0)

Why was the cost higher? Let’s look at what would be in that index…

SQL> select regexp_instr(string10, ‘XE’) from testtable;

REGEXP_INSTR(STRING10,’XE’)
—————————
0
0
0
0
0
1
0
0
0
0
… (lots of zeroes and the occasional positive integer) …
0
0
0

5000 rows selected

The index has a row for each block in the table. No histogram on it, so it doesn’t know that most of the index is zeroes.
What if we just want stuff from the index?

SQL> explain plan for
select regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 4444 | 48884 | 4
* 1 | INDEX FAST FULL SCAN| IDX_REGEXP | 4444 | 48884 | 4

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)

Yes, that uses the index. So how do we get the rest of the data from the table? Let’s try something else.

SQL> drop index idx_regexp;

We can take advantage of the fact that NULLS are not stored in an index by converting any zeroes (i.e. the regular expression didn’t match) to NULL:

SQL> create index idx_regexp_better on testtable
(CASE WHEN regexp_instr(string10, ‘XE’) > 0
THEN regexp_instr(string10, ‘XE’)
ELSE NULL END);

SQL> exec dbms_stats.gather_table_stats(user, ‘TESTTABLE’, cascade=>TRUE);

Will our query use the new improved index?

SQL> explain plan for
select string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 250 | 2750 | 465
* 1 | TABLE ACCESS FULL| TESTTABLE | 250 | 2750 | 465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)

No. Why? Because it can’t use the index, even if we try to force it with a hint. The optimiser doesn’t know it can modify my expression into the one that was used to build the index.

SQL> explain plan for
select /*+ index(testtable idx_regexp_better)*/
string10, regexp_instr(string10, ‘XE’)
from testtable
where regexp_instr(string10, ‘XE’) > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 250 | 2750 | 465
* 1 | TABLE ACCESS FULL| TESTTABLE | 250 | 2750 | 465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)

What if we copy the expression from the index into the query? Yes, that’s better.

SQL> explain plan for
select string10, regexp_instr(string10, ‘XE’)
from testtable
where CASE WHEN regexp_instr(string10, ‘XE’) > 0
THEN regexp_instr(string10, ‘XE’)
ELSE NULL END > 0;

Id | Operation | Name | Rows | Bytes | Cost
0 | SELECT STATEMENT | | 21 | 231 | 22
1 | TABLE ACCESS BY INDEX ROWID| TESTTABLE | 21 | 231 | 22
* 2 | INDEX RANGE SCAN | IDX_REGEXP_BETTER | 21 | | 1

2 – access(CASE WHEN REGEXP_INSTR (“STRING10”,’XE’)>0 THEN REGEXP_INSTR(“STRING10”,’XE’) ELSE NULL END >0)

Brilliant!


Easy way to crash a Oracle 10.2 session

This query crashes the session when it’s run or parsed.
I haven’t checked if this appears in other versions or not.


SQL> conn scott/tiger@orcl
Connected.

SQL> select * from v$version;
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bi
PL/SQL Release 10.2.0.3.0 – Production
CORE 10.2.0.3.0 Production
TNS for Solaris: Version 10.2.0.3.0 – Production
NLSRTL Version 10.2.0.3.0 – Production

SQL> explain plan for with q as
2 (select ‘x’ d from dual
3 union all
4 select ‘y’ from dual)
5 select d from q
6 union all
7 select d from q
8 order by dbms_random.value;
explain plan for with q as (select ‘x’ d from dual
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

The crashes happened when I added the “order by dbms_random.value”. The problem seems to be in the optimiser since I get the same results whether I run the query or just explain it. Possibly something to do with the optimiser wrongly assuming the random function is deterministic?

There’s a simple workaround, so I’m not too worried. I just change it to this and it works fine:


with q as
(select ‘x’ d from dual
union all
select ‘y’ from dual)
select d, dbms_random.value o from q
union all
select d, dbms_random.value o from q
order by o;


AUSOUG Conference 2007 Day 2

I enjoyed today’s programme very much. John Garmany gave a very easy-to-understand intro to the world of Regular Expressions. Luie Matthee spoke about virtualised Oracle 10g instances on Red Hat Enterprise Linux 5, most of which went over my head but I’m learning.

Apparently my red Red Hat hat is in the mail, but I did get a Toad toad.

It croaks when it is dropped or bashed on something. It amused my 2.5 and 0.5 year olds for hours, which is something I can’t say for all the $50+ toys they’ve gotten over the years.

Unfortunately I missed Chris’ talk on load testing web applications, I’ll have to ask my colleagues who attended about that one. I was particularly interested to hear Duncan Mills talk about how to move Oracle to SOA, and I wasn’t disappointed. The surprise is how easy it is to consume web services even with Forms 6i, and the new event triggers in Forms 11 sound very exciting (i.e. no more polling the database to simulate asynchronous processing) – there’ll probably be some new triggers like when-event-raised and when-custom-event which can be used similarly to when-timer-expired, but perhaps without so much network overhead.

The day ended with a rather friendly debate over development tools. Chris Muir started well with good-natured jabs at Forms (hasn’t that gone the way of Cobol? and why are there hardly any talks on Forms anymore?), Apex (voted “Best Toy of 2007”), and .Net (who cares about MS at an Oracle conference anyway?). Scott Wesley pointed out the large installed base of Forms, it’s a mature product, no-one needs to talk about it because there aren’t any problems with it anymore. Penny Cookson extolled the simple power that is available to the Apex developer. David Shields made a valiant attempt to influence a roomful of Oracle developers to give MS a chance, but I think the odds were stacked against him 🙂 Some probing questions from the audience prompted discussion of the various tools’ virtues for newbie developers, for integration with SOA, and for maintaining legacy apps. In the end I think it was decided that the winner is the “best tool for the job at hand”, in other words, everyone won (except perhaps for .Net 😉

My thanks to the AUSOUG committee for their hard work and dedication to bring off another excellent conference, and good luck in Melbourne!


Not an OpenWorld post: AUSOUG Conference Day 1

A great start to the Perth conference, great to catch up with old and not-so-old friends.

Listened to Tim Armitage from Oracle talk about database independence and why it’s just not needed due to the range of features Oracle provides for no extra cost.

Snuck into Penny Cookson’s demo of an Apex application she’s been working on for UWA. Quite a few tricks I hadn’t heard of, some of which will solve some niggling issues I’ve had on my applications. One simple one is the use of an application item to hold the previous page, to support reverse navigation from a page which has multiple entry points.

This afternoon will be time to stretch my knowledge a bit: Collin Klepfer on “Database Growth: Problems and Solutions”, and Gilbert Standen on “JVM Cutover for Oracle Forms” which I’m sure will mostly go over my head but I’ll pick up something about the replacement for JInitiator.

I suspect there’s a bit of “OpenWorld Envy” going on. It seems Penny wasn’t able to make it to SF and in her talk pointed out the fact that most blog references to Chris Muir at OW were about beer, specifically the consumption of it at various pubs.

I’m sure this tension will make for a spectacular show tomorrow evening with “The Great Oracle Development Tools Debate” featuring Penny and Chris, as well as David Shields, Vinod Patel (a current colleague of mine) and Scott Wesley (a former colleague of mine).


Faster APEX using mod_expires

This is a followup on my earlier series on setting up APEX on Linux. In it I described how I used Apache web server in front of APEX, instead of accessing APEX directly via OWA (this was so that I could use the same port to serve ordinary web pages and files via HTTP).

A consequence of that set up is that all my APEX pages are a little slow to load up because none of the images are being cached on the client. To solve this I added the following lines to my httpd.conf:

<LocationMatch /i>
   ExpiresActive on
   ExpiresDefault "access plus 1 month"
</LocationMatch>

<LocationMatch /apex>
   ExpiresActive on
   ExpiresDefault "access"
   ExpiresByType image/gif "access plus 1 month"
   ExpiresByType image/jpeg "access plus 1 month"
   ExpiresByType image/bmp "access plus 1 month"
</LocationMatch>

My location “/i” points to the images used by APEX. Since there are only images in there I’ve told it to expire everything (ExpiresDefault) after a month.

The location “/apex” is for the actual APEX pages, including APEX application images which are not in /i/. I’ve set the default to “access” (in other words, don’t cache them), but added exceptions for the various image types in use to cache for a month.