Simplify your SQL – Avoid Unnecessary Joins

Simplify your SQL – Avoid Unnecessary Joins

I can’t count how many times I’ve seen SQL like the following:

SELECT b.some_column
FROM a, b
WHERE a.id = :p_id
AND b.id = a.id;

(where table a is unique on a.id, and table b has a referential integrity constraint to a, i.e. b.id references a.id)

I’ve seen many variations on this theme, where tables are unnecessarily referenced in a query. In this case, the query should have been formulated as:

SELECT b.some_column
FROM b
WHERE b.id = :p_id;

To be fair, this can often be blamed on maintenance by multiple programmers over time; maybe at some point some data from table a was needed, and only later removed; or we have a cut-and-paste artifact by a less experienced programmer.

The only problem with this is code simplicity (generally, the simpler the code, the easier it is to understand it, and therefore to maintain it); this will not usually be a performance issue. On Oracle, as long as you have a unique constraint on a.id, the optimiser will generally infer that :p_id can be used to directly access table b, and will then probably do a quick index unique scan on the a.id index (technically, the optimiser should know it doesn’t even have to do this, because of the foreign key constraint on b.id -> a.id and the unique constraint on a.id).

[Aside: Of course, if those constraints did NOT exist, then the two queries above would be logically different, and only one or the other would be correct for a given situation. The programmer must in either case know why they are or are not querying any particular table.]

Here’s a simple example using the SCOTT sample schema (run on Oracle 10.1.0.2, but should be pretty much the same on any version). I’ll issue two queries that are logically equivalent; the first one will unnecessarily access DEPT; the second one will access EMP directly.

==========================================================

SQL> select emp.ename from emp, dept where emp.deptno = dept.deptno and dept.deptno = 10;

ENAME
———-
CLARK
KING
MILLER

Elapsed: 00:00:00.01

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=5 Bytes=60)
1 0 NESTED LOOPS (Cost=3 Card=5 Bytes=60)
2 1 INDEX (UNIQUE SCAN) OF ‘PK_DEPT’ (INDEX (UNIQUE)) (Cost=0 Card=1 Bytes=3)
3 1 TABLE ACCESS (FULL) OF ‘EMP’ (TABLE) (Cost=3 Card=5 Bytes=45)

Statistics
———————————————————-
0 recursive calls
0 db block gets
9 consistent gets
0 physical reads
0 redo size
442 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed

SQL> select emp.ename from emp where emp.deptno = 10;

ENAME
———-
CLARK
KING
MILLER

Elapsed: 00:00:00.01

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=5 Bytes=45)
1 0 TABLE ACCESS (FULL) OF ‘EMP’ (TABLE) (Cost=3 Card=5 Bytes=45)

Statistics
———————————————————-
1 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
442 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed

==========================================================

As you can see, the second query is much simpler. The performance is pretty much identical.

What I haven’t included above is the predicates for the plans; if I had you’d see that the optimiser does actually apply the “deptno = 10” predicate directly to EMP (thanks to Connor for pointing this out).

In these simple queries it may not seem much of an issue; but for queries involving many tables with many predicates, every bit of unnecessary code makes it all the harder to read, understand and maintain.