Will Oracle use my regexp function-based index?

It was mentioned at the AUSOUG conference by one of the speakers that he couldn’t get the database to use a function-based index based on the regexp functions. I thought this was a little strange so decided to try for myself.

SQL> select * from v$version; Oracle Database 10g Express Edition Release 10.2.0.1.0
Create a test table and gather stats on it:
SQL> create table testtable as select rownum rn, dbms_random.string(‘a’,10) string10, rpad(‘x’,2000,’x’) padding from all_objects where rownum <= 5000;

SQL> exec dbms_stats.gather_table_stats(user, ‘TESTTABLE’, cascade=>TRUE);
This is our test query using regexp_instr:
SQL> explain plan for select string10, regexp_instr(string10, ‘XE’) from testtable where regexp_instr(string10, ‘XE’) > 0;


 Id  | Operation         | Name      | Rows  | Bytes | Cost
   0 | SELECT STATEMENT  |           |   243 |   475K|   465
*  1 |  TABLE ACCESS FULL| TESTTABLE |   243 |   475K|   465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0) – dynamic sampling used for this statement
How many rows are actually returned by this query?
SQL> select string10, regexp_instr(string10, ‘XE’) from testtable where regexp_instr(string10, ‘XE’) > 0;


STRING10   REGEXP_INSTR(STRING10,’XE’)
———- —————————
fwXEKwoDhG 3
rSmdOXEkeu 6
WXEbrrjXcW 2
qWEThNXEBO 7
XEQtOwjOCW 1
MRXEoAicUQ 3                           
IpECtZbjXE 9                           
LdmXjyePXE 9                           
gXEaiGrjSX 2                           
PoqtvdGcXE 9                           
ZunAgePXXE 9                           
evXEFhBpzX 3                           
ZxkXstXEwJ 7                           
yVpjHzXECY 7                           
FkaPMpXEgR 7                           
RuXUnXEQSO 6                           
OooRCjXXEK 8                           
XzceiWPXEr 8                           
XEZlpdNMhG 1                           
jJjzfXEdDw 6
CowyXEuHDm 5

21 rows selected
Hmmm… there should be a better way.
Let’s try a simple function-based index to speed things up:
SQL> create index idx_regexp on testtable (regexp_instr(string10, ‘XE’));
Will this query use the index?
SQL> explain plan for select string10, regexp_instr(string10, ‘XE’) from testtable where regexp_instr(string10, ‘XE’) > 0;


 Id  | Operation         | Name      | Rows  | Bytes | Cost
   0 | SELECT STATEMENT  |           |  4444 | 48884 |   465
*  1 |  TABLE ACCESS FULL| TESTTABLE |  4444 | 48884 |   465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)
Why not? The calculated cost with the index was higher than with a full table scan.
SQL> explain plan for select /*+ index(testtable idx_regexp) */ string10, regexp_instr(string10, ‘XE’) from testtable where regexp_instr(string10, ‘XE’) > 0;


 Id  | Operation                   | Name       | Rows  | Bytes | Cost
   0 | SELECT STATEMENT            |            |  4444 | 48884 |  1512
   1 |  TABLE ACCESS BY INDEX ROWID| TESTTABLE  |  4444 | 48884 |  1512
*  2 |   INDEX RANGE SCAN          | IDX_REGEXP |  4444 |       |    10

2 – access( REGEXP_INSTR (“STRING10”,’XE’)>0)
Why was the cost higher? Let’s look at what would be in that index…
SQL> select regexp_instr(string10, ‘XE’) from testtable;


REGEXP_INSTR(STRING10,’XE’) 
————————— 
0
0                           
0                           
0                           
0                           
1
0                           
0                           
0                           
0                           
… (lots of zeroes and the occasional positive integer) …
0                           
0                           
0

5000 rows selected
The index has a row for each block in the table. No histogram on it, so it doesn’t know that most of the index is zeroes.
What if we just want stuff from the index?
SQL> explain plan for select regexp_instr(string10, ‘XE’) from testtable where regexp_instr(string10, ‘XE’) > 0;


 Id  | Operation            | Name       | Rows  | Bytes | Cost
   0 | SELECT STATEMENT     |            |  4444 | 48884 |     4
*  1 |  INDEX FAST FULL SCAN| IDX_REGEXP |  4444 | 48884 |     4

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)
Yes, that uses the index. So how do we get the rest of the data from the table? Let’s try something else.
SQL> drop index idx_regexp;
We can take advantage of the fact that NULLS are not stored in an index by converting any zeroes (i.e. the regular expression didn’t match) to NULL:
SQL> create index idx_regexp_better on testtable (CASE WHEN regexp_instr(string10, ‘XE’) > 0 THEN regexp_instr(string10, ‘XE’) ELSE NULL END);

SQL> exec dbms_stats.gather_table_stats(user, ‘TESTTABLE’, cascade=>TRUE);
Will our query use the new improved index?
SQL> explain plan for select string10, regexp_instr(string10, ‘XE’) from testtable where regexp_instr(string10, ‘XE’) > 0;


 Id  | Operation         | Name      | Rows  | Bytes | Cost
   0 | SELECT STATEMENT  |           |   250 |  2750 |   465
*  1 |  TABLE ACCESS FULL| TESTTABLE |   250 |  2750 |   465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)
No. Why? Because it can’t use the index, even if we try to force it with a hint. The optimiser doesn’t know it can modify my expression into the one that was used to build the index.
SQL> explain plan for select /*+ index(testtable idx_regexp_better)*/ string10, regexp_instr(string10, ‘XE’) from testtable where regexp_instr(string10, ‘XE’) > 0;


 Id  | Operation         | Name      | Rows  | Bytes | Cost
   0 | SELECT STATEMENT  |           |   250 |  2750 |   465
*  1 |  TABLE ACCESS FULL| TESTTABLE |   250 |  2750 |   465

1 – filter( REGEXP_INSTR (“STRING10”,’XE’)>0)
What if we copy the expression from the index into the query? Yes, that’s better.
SQL> explain plan for select string10, regexp_instr(string10, ‘XE’) from testtable where CASE WHEN regexp_instr(string10, ‘XE’) > 0 THEN regexp_instr(string10, ‘XE’) ELSE NULL END > 0;


 Id  | Operation                   | Name              | Rows  | Bytes | Cost
   0 | SELECT STATEMENT            |                   |    21 |   231 |    22
   1 |  TABLE ACCESS BY INDEX ROWID| TESTTABLE         |    21 |   231 |    22
*  2 |   INDEX RANGE SCAN          | IDX_REGEXP_BETTER |    21 |       |     1

2 – access(CASE WHEN REGEXP_INSTR (“STRING10”,’XE’)>0 THEN REGEXP_INSTR(“STRING10”,’XE’) ELSE NULL END >0)
Brilliant!

Jeffrey Kemp 22 November 2007

SQL / tips-&-tricks /

About Jeffrey Kemp

Application Designer & Developer at Oracle specialising in Oracle APEX (Application Express), Oracle SQL and PL/SQL. Oracle ACE Alumni ♠️. Piano player, father, husband, Christian. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Will Oracle use my regexp function-based index?

About Jeffrey Kemp

Leave a Reply Cancel reply