Simplify your SQL – Avoid Unnecessary Joins

Simplify your SQL – Avoid Unnecessary Joins

I can’t count how many times I’ve seen SQL like the following:

SELECT b.some_column
FROM a, b
WHERE a.id = :p_id
AND b.id = a.id;

(where table a is unique on a.id, and table b has a referential integrity constraint to a, i.e. b.id references a.id)

I’ve seen many variations on this theme, where tables are unnecessarily referenced in a query. In this case, the query should have been formulated as:

SELECT b.some_column
FROM b
WHERE b.id = :p_id;

To be fair, this can often be blamed on maintenance by multiple programmers over time; maybe at some point some data from table a was needed, and only later removed; or we have a cut-and-paste artifact by a less experienced programmer.

The only problem with this is code simplicity (generally, the simpler the code, the easier it is to understand it, and therefore to maintain it); this will not usually be a performance issue. On Oracle, as long as you have a unique constraint on a.id, the optimiser will generally infer that :p_id can be used to directly access table b, and will then probably do a quick index unique scan on the a.id index (technically, the optimiser should know it doesn’t even have to do this, because of the foreign key constraint on b.id -> a.id and the unique constraint on a.id).

[Aside: Of course, if those constraints did NOT exist, then the two queries above would be logically different, and only one or the other would be correct for a given situation. The programmer must in either case know why they are or are not querying any particular table.]

Here’s a simple example using the SCOTT sample schema (run on Oracle 10.1.0.2, but should be pretty much the same on any version). I’ll issue two queries that are logically equivalent; the first one will unnecessarily access DEPT; the second one will access EMP directly.

==========================================================

SQL> select emp.ename from emp, dept where emp.deptno = dept.deptno and dept.deptno = 10;

ENAME
———-
CLARK
KING
MILLER

Elapsed: 00:00:00.01

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=5 Bytes=60)
1 0 NESTED LOOPS (Cost=3 Card=5 Bytes=60)
2 1 INDEX (UNIQUE SCAN) OF ‘PK_DEPT’ (INDEX (UNIQUE)) (Cost=0 Card=1 Bytes=3)
3 1 TABLE ACCESS (FULL) OF ‘EMP’ (TABLE) (Cost=3 Card=5 Bytes=45)

Statistics
———————————————————-
0 recursive calls
0 db block gets
9 consistent gets
0 physical reads
0 redo size
442 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed

SQL> select emp.ename from emp where emp.deptno = 10;

ENAME
———-
CLARK
KING
MILLER

Elapsed: 00:00:00.01

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=5 Bytes=45)
1 0 TABLE ACCESS (FULL) OF ‘EMP’ (TABLE) (Cost=3 Card=5 Bytes=45)

Statistics
———————————————————-
1 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
442 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed

==========================================================

As you can see, the second query is much simpler. The performance is pretty much identical.

What I haven’t included above is the predicates for the plans; if I had you’d see that the optimiser does actually apply the “deptno = 10” predicate directly to EMP (thanks to Connor for pointing this out).

In these simple queries it may not seem much of an issue; but for queries involving many tables with many predicates, every bit of unnecessary code makes it all the harder to read, understand and maintain.

Small SAMPLE bug in Oracle 9i (ORA-30561)
” = empty string and/or NULL…

Leave a Reply

Your email address will not be published / Required fields are marked *