Small SAMPLE bug in Oracle 9i (ORA-30561)

If you’re using the SAMPLE clause in Oracle 9i, in combination with a join, be aware of this small gotcha. I found a workaround, thankfully.

select * from (select * from dual), (select * from dual sample (10));

Expected result: should return zero or one row (more or less at random)

Actual result (tested with 9.2.0.6.0): “ORA-30561: SAMPLE option not allowed in statement with multiple table references”

Workaround: put the query with the SAMPLE clause first, i.e.

select * from (select * from dual sample (10)), (select * from dual);

Note: It works fine in 10g (tested under 10.1.0.2.0) since 10g does not restrict how many SAMPLE clauses are in a query.


User-named locks with DBMS_LOCK

The application I’m working on involves a number of views that provide an interface between two databases that have quite different models. The views implement inserts, updates and deletes via Instead-of triggers.

On one of these views, the update trigger needs to actually do an update and an insert on the source table: it must update the existing row to mark it as “closed”, and then insert a new row to take its place. In this way the view hides the fact that the table keeps a history of changes.

Now, a logical constraint is that there can only be one “open” record for a given entity. Therefore, there can be zero or more “closed” records, and there must be a single “open” record. This constraint, however, has not been implemented on the database (naughty naughty) because our code is perfect and never makes a mistake 😉

This worked perfectly well until our thorough Java guy decided to do some load testing, in which a process would hammer away inserting, updating and deleting the same row over and over again, in multiple concurrent sessions.

This had the result that we ended up seeing lots of “open” records for the entity, which violates the logical constraint described above. “That’s not allowed” I said, and soon realised that the fact that multiple sessions are updating the same row means that even a trigger-based constraint check will not solve the problem, since the triggers cannot see updates from other sessions.

So, we need to have a lock of some sort, so that when one session wants to update an entity, other sessions cannot start updating it as well until the first one is finished. I can’t just lock the row I want to update, because after I’ve updated it, I still need the lock while I insert the new row.

The solution is to use a package supplied with Oracle calls DBMS_LOCK, with which I can request “user-named locks”. This package gives a lot of options but I’m only interested in simple Exclusive locks, so I’ll also create some wrapper procedures.

Easy, right? Well, yeah once you get past a few gotchas.

Try #1:

-- wrapper function to Request a lock; returns a lock handle
FUNCTION request_lock (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  lock_status NUMBER;
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  lock_status := DBMS_LOCK.REQUEST(
    lockhandle => lock_handle,
    lockmode => DBMS_LOCK.X_MODE, -- eXclusive
    timeout => DBMS_LOCK.MAXWAIT, -- wait forever
    release_on_commit => TRUE);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'request_lock failed: ' || lock_status);
  END IF;
RETURN lock_handle;
END request_lock;

-- wrapper to release a lock; call with the lock handle obtained previously
PROCEDURE release_lock (lock_handle IN VARCHAR2) IS
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.RELEASE(
    lockhandle => lock_handle);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'release_lock failed: ' || lock_status);
  END IF;
END release_lock;

--example calling code
...
DECLARE
  lock_handle VARCHAR2(128);
BEGIN
  lock_handle := request_lock('MYLOCK1');
  -- do some stuff
  release_lock(lock_handle);
END;
...

The above code “worked” ok (although I hadn’t yet tested to see if it was actually getting the lock) until I started calling it from triggers on the views, when I started getting “ORA-04092: cannot commit in a trigger”. What the? I wasn’t doing any commits in my triggers. I couldn’t find any references to the DBMS_LOCK functions doing commits in the 10g documentation, but I thought, ok let’s make these wrappers Autonomous, so they can commit in peace:

Try #2:

FUNCTION request_lock (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  lock_status NUMBER;
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  lock_status := DBMS_LOCK.REQUEST(
    lockhandle => lock_handle,
    lockmode => DBMS_LOCK.X_MODE, -- eXclusive
    timeout => DBMS_LOCK.MAXWAIT, -- wait forever
    release_on_commit => TRUE);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'request_lock failed: ' || lock_status);
  END IF;
  RETURN lock_handle;
END request_lock;

PROCEDURE release_lock (lock_handle IN VARCHAR2) IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.RELEASE(
    lockhandle => lock_handle);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'release_lock failed: ' || lock_status);
  END IF;
END release_lock;

No good: I got “ORA-06519: active autonomous transaction detected and rolled back”! This didn’t make much sense at the time, but I put a COMMIT in anyway. But of course, this wouldn’t work with the triggers. Anyway, I started experimenting and soon realised that even if I got the wrappers to work, the locks weren’t being made anyway, and I was able to request locks multiple times, which I expected to be impossible if the wrapper was working correctly.

Finally, I logged in as a user with some DBA privileges and read the specification of the SYS.DBMS_LOCK package. The one thing that jumped out at me was a little comment under ALLOCATE_UNIQUE:

“This routine will always do a commit.”

There was my answer. It was the ALLOCATE_UNIQUE that was doing a commit, not REQUEST or RELEASE. And, because I was setting release_on_commit => TRUE in the call to REQUEST, whenever I committed the lock was being released, straight away.

So I changed release_on_commit => FALSE, and moved ALLOCATE_UNIQUE to a separate function (set up as an Autonomous Transaction). Having read a bit more I realised I could make this API nicer to the programmers by allowing the release_lock to accept the lock name instead of requiring them to manage the lock handle. I also improved the error messages.

Try #3:

-- internal function to get a lock handle
-- (private for use by request_lock and release_lock)
FUNCTION get_handle (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  RETURN lock_handle;
END get_handle;

PROCEDURE request_lock (lock_name IN VARCHAR2) IS
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.REQUEST(
    lockhandle => get_handle(lock_name),
    lockmode => DBMS_LOCK.X_MODE, -- eXclusive
    timeout => DBMS_LOCK.MAXWAIT, -- wait forever
    release_on_commit => FALSE);
  CASE lock_status
    WHEN 0 THEN NULL;
    WHEN 2 THEN RAISE_APPLICATION_ERROR(-20000,'deadlock detected');
    WHEN 4 THEN RAISE_APPLICATION_ERROR(-20000,'lock already obtained');
    ELSE RAISE_APPLICATION_ERROR(-20000,'request lock failed: ' || lock_status);
  END CASE;
END request_lock;

-- wrapper to release a lock
PROCEDURE release_lock (lock_name IN VARCHAR2) IS
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.RELEASE(
    lockhandle => get_handle(lock_name));
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'release lock failed: ' || lock_status);
  END IF;
END release_lock;

--example calling code
...
BEGIN
  request_lock('MYLOCK1');
  -- do some stuff
  release_lock('MYLOCK1');
END;
...

This worked perfectly, even from triggers. I even tried it out with multiple sessions:

Session 1:

exec request_lock('JEFF');

Session 2:

exec request_lock('JEFF');

At this point, session 2 just hung – it was waiting for Session 1 to release the lock.

Session 1:

exec release_lock('JEFF');

Now session 2 returned and had the lock. The same thing happened if I just closed down session 1.

If I changed request_lock to set timeout to some value (e.g. 3 seconds), session 2 would wait for that amount of time, then raise my error message “request lock failed – 1” (1 is the status code when a lock request times out).

I tried it with a deadlock situation, with timeout = MAXWAIT:

Session 1:

exec request_lock('MYLOCKA');

Session 2:

exec request_lock('MYLOCKB');

Session 1:

exec request_lock('MYLOCKB');

At this point session 1 hangs as it waits for session 2 to release the lock.

Session 2:

exec request_lock('MYLOCKA');

This is a deadlock. Both sessions are waiting for a lock held by the other session. After a few seconds Oracle automatically detects the deadlock and picks one session (at random, apparently) and it returns the lock status 2 (my code raises the error message “deadlock detected”). The other session stayed hung, however – it was still waiting, but there was no deadlock.

I tried the above scenario again, but with timeout = 5 seconds. If I executed the request_locks quick enough, I got “deadlock detected” on one session, and the other session eventually timed out.

So, everything works as expected, and the Java guys can hammer away at my views with impunity.

UPDATE for Oracle 12.1 and later:

The get_handle function no longer needs to be autonomous, since DBMS_LOCK now provides an autonomous variant of ALLOCATE_UNIQUE:

FUNCTION get_handle (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE_AUTONOMOUS (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  RETURN lock_handle;
END get_handle;

https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_LOCK.html#GUID-0D6C0622-05CA-4259-8059-330D512E1574

(thanks to Kim Berg Hansen for the update)


Instead-of-Delete trigger problem

In software development, quite often the problem you encounter has virtually nothing to do with the thing you’re trying to do.

I’m creating some views which will provide an interface to a database. The views will have Instead-of triggers so that all DML will be passed to my own (packaged) procedures.

I’ve got a particular view which is a join between a base table, filtered by some criteria, and a code/description mapping view. I’ve implemented instead-of-insert, update and delete triggers that do the appropriate DML on the base table.

Inserts and updates work fine, but when deleting from the view I’m getting “TM contention” wait events. When deleting across a database link the session gets “ORA-02049: timeout: distributed transaction waiting for lock” after a long delay. If the delete is issued within the same database as the view, the session waits indefinitely for the lock.

I had a look at the view and noticed that it is a key-preserved view, so I theorised that maybe the session was locking the row before running my instead-of-delete trigger that does the delete; the trigger tries to delete the row that has already been locked. Doesn’t make sense. When I remove the instead-of-delete trigger however, I get “ORA-01752: cannot delete from view without exactly one key-preserved table”. So I need the trigger to handle the delete.

Looked up the Oracle doc “Instance Tuning Using Performance Views” about “TM enqueue”: “The most common reason for waits on TM locks tend to involve foreign key constraints where the constrained columns are not indexed. Index the foreign key columns to avoid this problem.” I saw this ref come up in a search early on but I initially dismissed it because my problem wasn’t related to performance. I’m glad I took a second look.

Righty-o. Looked at the table I was deleting from, and lo and behold there is another table with a fk to my table with no index on the fk column (mind you, there was an index that contained that column along with another column, obviously the database wasn’t going to use that to avoid the table lock).

When I created an index on the fk column, the problem was resolved. Brilliant!

The application never needed to delete from that table before, so this had never been encountered. In this case adding the index solved the problem. It’s on a very small, rarely-used table so the index won’t have a negative impact. Took about half an hour to work out, but I learned a bit more about Oracle, so that’s good.