Avoiding Lost Updates #3: The Solution

Background: see my earlier posts Avoiding Lost Updates
and Avoiding Lost Updates #2: Updateable Views

I’ve discussed this problem with my colleagues and have come up with a number of potential solutions, one of which we believe is bulletproof.

The desired property of such a solution is to reduce the probability that an update of a view will leave the row with the same “version” number, to as close to zero as possible. A bulletproof solution would have a probability of exactly zero.

Solution #1:

Instead of taking the SUM of all the versions, shift the versions from the outer-joined tables (e.g. by multiplying by 10000, 100000, etc).

This reduces the likelihood of getting a false negative; but it doesn’t reduce it to zero, since it would still be theoretically possible to get an update that would cause compensating deletes and updates on the outer-joined tables.

Solution #2:

Use ORA_ROWSCN instead of version.

This also reduces the likelihood of a false negative; in our view, astronomically so; but with a bit of mental gymnastics we can think of a contrived scenario in which the total of ORA_ROWSCN stays the same.

Solution #3:

Cause an update on the view to always do at least one update of one of the non-outer-joined tables (even if it is updating a column to the same value); and don’t include versions from outer-joined tables in the summed version.

This means that version will now always increase whenever an update occurs; therefore, the probability is zero, and the solution is bullet-proof!

Thanks to Howard and Maciej for their help.


Avoiding Lost Updates #2: Updateable Views

This is a followup to my earlier post “Avoiding Lost Updates: Protecting Data in a Multi-user environment”.

In this post I’ll talk about how this problem can be solved when you’re creating updateable views. Now, if you read the previous post, you’ll know the five options that I thought of (plus a sixth pointed out by Gary). For updateable views, if you want to protect against lost updates (in other words, “Last Update Wins” is not an acceptable solution), you can still use “Column Compare” and “Hash Compare” without any changes.

For the “Timestamp Compare”, “Version Compare” and Gary’s ORA_ROWSCN options, however, you’ll need to bring some data from the base tables up to the view’s SELECT clause.

For “Timestamp Compare” and ORA_ROWSCN, you need to get the latest of the timestamps/SCNs. You could use the GREATEST function for this (as long as ORA_ROWSCN is always greater for later transactions?).

It’s just as simple when you’re using “Version Compare”. It was pointed out by a colleague of mine that all you have to do is present the SUM of all the version columns from each base table that can be updateable via the view. This way, if any or all of the base tables experience changes, the “version” column in the view will increment by 1 or more. Most of the time, that is!

Let’s consider an example. We have the following imaginatively named base tables with a view on them:

CREATE TABLE table_a (
id NUMBER(12) PRIMARY KEY,
t VARCHAR2(10),
version NUMBER(38) NOT NULL);

CREATE TABLE table_b (
id NUMBER(12) PRIMARY KEY,
u VARCHAR2(30) NOT NULL,
version NUMBER(38) NOT NULL );

CREATE TABLE table_c (
id NUMBER(12) PRIMARY KEY,
v VARCHAR2(10) NOT NULL,
version NUMBER(38) NOT NULL );

CREATE TRIGGER table_a_ins_upd_row BEFORE INSERT OR UPDATE ON table_a FOR EACH ROW
BEGIN
IF inserting THEN :NEW.version := 1; ELSE :NEW.version := :OLD.version + 1; END IF;
END;
/

CREATE TRIGGER table_b_ins_upd_row BEFORE INSERT OR UPDATE ON table_b FOR EACH ROW
BEGIN
IF inserting THEN :NEW.version := 1; ELSE :NEW.version := :OLD.version + 1; END IF;
END;
/

CREATE TRIGGER table_c_ins_upd_row BEFORE INSERT OR UPDATE ON table_c FOR EACH ROW
BEGIN
IF inserting THEN :NEW.version := 1; ELSE :NEW.version := :OLD.version + 1; END IF;
END;
/

CREATE VIEW view_ab AS
SELECT table_a.id, table_a.t, table_b.u, table_c.v, table_a.version + table_b.version + table_c.version version
FROM table_a
LEFT OUTER JOIN table_b ON table_a.id = table_b.id
LEFT OUTER JOIN table_c ON table_a.id = table_c.id;

The view therefore shows all rows in table_a, as well as the corresponding rows in table_b and table_c if they exist.

Assume that the view_ab has instead-of-insert and update triggers that do the following:

Insert/update id and t on table_a.
If “u” is inserted or updated with a value, do the corresponding insert or update on table_b.
If “u” is cleared (set to NULL), delete it from table_b.
And the same for “v” in relation to table_c.

If I do this:

INSERT INTO view_ab (id, t, u, v) VALUES (1, 'A', 'B', 'C');

The view should have (1, ‘A’, ‘B’, ‘C’, 3). Version = 3 because we have one row in each of the three tables each with version = 1.

Now if I instead did:

INSERT INTO view_ab (id, t, u) VALUES (1, 'A', 'B');

The view should have the row (1, ‘A’, ‘B’, null, 2). Version = 2 because we have rows in table_a and table_b with version = 1, but no row in table_c.

The only purpose of the version column is to change if any insert, update or delete occurs on any of the base tables.

What if I do the following:

UPDATE view_ab SET u = NULL, v = 'C' where id = 1;

What will the view have? It will have (1, ‘A’, null, ‘C’, 2). Hang on there – I have done a change to the row, but the version has stayed the same! Why? Because the update trigger had to delete from table_b (reducing the total version to 1) and insert into table_c (increasing the total version back to 2).

How to solve this? This is answered in my next post.


Avoiding Lost Updates: Protecting Data in a Multi-user environment

There is a particular problem with Oracle and other databases where access to data is not serialized (by default), and there are a number of ways this problem is generally dealt with. The problem crops up when you have many users, some of whom may wish to update the same data at around the same time. The process that takes place when a user wishes to update a row is:

Session #1:

  1. Read the data from the database into the client memory.
  2. Make the changes in the client memory.
  3. Write the changes from the client memory to the database.

Between steps 1 and 2 another session (#2) might read the same data (unless you’re in a particularly poorly designed database in which writers block readers; e.g. one database I helped to refactor did just that – it issued exclusive locks on rows every time they were read into a form(!) – this was the first thing we got rid of in the new system), and modify it.

The row isn’t locked, so the other user makes their changes and clicks their “Save” button which commits the changes to the database. Meanwhile, session #1 finally gets to step 3 and saves their changes. If no checks happened, they would overwrite the changes made by session #2.

To illustrate:

  • Session #1 reads the row “Joe Bloggs”.
  • Session #2 reads the row “Joe Bloggs”.
  • Session #2 changes the row to “Jane Bloggs” and saves.
    – now, Session #1 is still looking at “Joe Bloggs” on their screen.
  • Session #1 changes the row to “Joe Jones” and saves.

The problem is that Session #1 has unwittingly undone the change that Session #2 made.

There are a number of ways this problems is approached that I know of:

  1. “Last Update Wins” – just allow the update.
  2. “Column Compare” – all the columns to be updated are compared when updating, and if any are different to the values originally read, an error is returned (e.g. Oracle Forms does this and issues “FRM-40654: Record has been updated by another user. Re-query to see change.”).
  3. “Hash Compare” – this is a variant on the “Column Compare” method used by HTMLDB. It compares a hash of the combined data for the row.
  4. “Timestamp Compare” – the table has a “timestamp” column, which is automatically set to the last date/time when the row was inserted or most recently updated. An update will only succeed if the timestamp of the updated row is the same as the timestamp originally read.
  5. “Version Compare” – the table has a “version” column, which is set to 1 on insert and incremented on update. An update will only succeed if the version of the updated row is the same as the version originally read.

Note: all the “Compare” approaches above do not necessarily require a second query-and-lock on the row prior to the update; the best way to implement these is to add the compare as a predicate for the update – then, if no rows are updated, the caller can tell that the update failed (because the row had been updated or deleted by another session).

Comparison

  1. “Last Update Wins” – This approach may be acceptable for some applications.
  2. “Column Compare” – If only the updated columns are compared, this method means that two sessions can modify different columns in the same row independently.
  3. “Hash Compare” – This method stops two sessions modifying the same row, even if they’re touching different sets of columns.
  4. “Timestamp Compare” – This method requires the timestamp to have sufficient granularity so that a read by one session, followed by an update by another session, will not occur at exactly the same timestamp. Generally milliseconds or microseconds are needed at least, although even this granularity might not be sufficient for some situations. This method also stops two sessions modifying the same row, even if they’re touching different sets of columns.
  5. “Version Compare” – This is a favourite for Java developers and works very well. It is simple to code, computationally cheap, and doesn’t suffer from the timestamp granularity problem. This method also stops two sessions modifying the same row, even if they’re touching different sets of columns.

From the above comments you might think I prefer #5 (Version Compare) but I think it can depend on the requirements of the system you’re working with. Of course, if you’re using Forms or HTMLDB you don’t need to worry about it too much because it’s been largely taken care of for you.

In my next entry I describe how this works when you’re working with updateable views.

Further Reading


User-named locks with DBMS_LOCK

The application I’m working on involves a number of views that provide an interface between two databases that have quite different models. The views implement inserts, updates and deletes via Instead-of triggers.

On one of these views, the update trigger needs to actually do an update and an insert on the source table: it must update the existing row to mark it as “closed”, and then insert a new row to take its place. In this way the view hides the fact that the table keeps a history of changes.

Now, a logical constraint is that there can only be one “open” record for a given entity. Therefore, there can be zero or more “closed” records, and there must be a single “open” record. This constraint, however, has not been implemented on the database (naughty naughty) because our code is perfect and never makes a mistake 😉

This worked perfectly well until our thorough Java guy decided to do some load testing, in which a process would hammer away inserting, updating and deleting the same row over and over again, in multiple concurrent sessions.

This had the result that we ended up seeing lots of “open” records for the entity, which violates the logical constraint described above. “That’s not allowed” I said, and soon realised that the fact that multiple sessions are updating the same row means that even a trigger-based constraint check will not solve the problem, since the triggers cannot see updates from other sessions.

So, we need to have a lock of some sort, so that when one session wants to update an entity, other sessions cannot start updating it as well until the first one is finished. I can’t just lock the row I want to update, because after I’ve updated it, I still need the lock while I insert the new row.

The solution is to use a package supplied with Oracle calls DBMS_LOCK, with which I can request “user-named locks”. This package gives a lot of options but I’m only interested in simple Exclusive locks, so I’ll also create some wrapper procedures.

Easy, right? Well, yeah once you get past a few gotchas.

Try #1:

-- wrapper function to Request a lock; returns a lock handle
FUNCTION request_lock (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  lock_status NUMBER;
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  lock_status := DBMS_LOCK.REQUEST(
    lockhandle => lock_handle,
    lockmode => DBMS_LOCK.X_MODE, -- eXclusive
    timeout => DBMS_LOCK.MAXWAIT, -- wait forever
    release_on_commit => TRUE);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'request_lock failed: ' || lock_status);
  END IF;
RETURN lock_handle;
END request_lock;

-- wrapper to release a lock; call with the lock handle obtained previously
PROCEDURE release_lock (lock_handle IN VARCHAR2) IS
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.RELEASE(
    lockhandle => lock_handle);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'release_lock failed: ' || lock_status);
  END IF;
END release_lock;

--example calling code
...
DECLARE
  lock_handle VARCHAR2(128);
BEGIN
  lock_handle := request_lock('MYLOCK1');
  -- do some stuff
  release_lock(lock_handle);
END;
...

The above code “worked” ok (although I hadn’t yet tested to see if it was actually getting the lock) until I started calling it from triggers on the views, when I started getting “ORA-04092: cannot commit in a trigger”. What the? I wasn’t doing any commits in my triggers. I couldn’t find any references to the DBMS_LOCK functions doing commits in the 10g documentation, but I thought, ok let’s make these wrappers Autonomous, so they can commit in peace:

Try #2:

FUNCTION request_lock (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  lock_status NUMBER;
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  lock_status := DBMS_LOCK.REQUEST(
    lockhandle => lock_handle,
    lockmode => DBMS_LOCK.X_MODE, -- eXclusive
    timeout => DBMS_LOCK.MAXWAIT, -- wait forever
    release_on_commit => TRUE);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'request_lock failed: ' || lock_status);
  END IF;
  RETURN lock_handle;
END request_lock;

PROCEDURE release_lock (lock_handle IN VARCHAR2) IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.RELEASE(
    lockhandle => lock_handle);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'release_lock failed: ' || lock_status);
  END IF;
END release_lock;

No good: I got “ORA-06519: active autonomous transaction detected and rolled back”! This didn’t make much sense at the time, but I put a COMMIT in anyway. But of course, this wouldn’t work with the triggers. Anyway, I started experimenting and soon realised that even if I got the wrappers to work, the locks weren’t being made anyway, and I was able to request locks multiple times, which I expected to be impossible if the wrapper was working correctly.

Finally, I logged in as a user with some DBA privileges and read the specification of the SYS.DBMS_LOCK package. The one thing that jumped out at me was a little comment under ALLOCATE_UNIQUE:

“This routine will always do a commit.”

There was my answer. It was the ALLOCATE_UNIQUE that was doing a commit, not REQUEST or RELEASE. And, because I was setting release_on_commit => TRUE in the call to REQUEST, whenever I committed the lock was being released, straight away.

So I changed release_on_commit => FALSE, and moved ALLOCATE_UNIQUE to a separate function (set up as an Autonomous Transaction). Having read a bit more I realised I could make this API nicer to the programmers by allowing the release_lock to accept the lock name instead of requiring them to manage the lock handle. I also improved the error messages.

Try #3:

-- internal function to get a lock handle
-- (private for use by request_lock and release_lock)
FUNCTION get_handle (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  RETURN lock_handle;
END get_handle;

PROCEDURE request_lock (lock_name IN VARCHAR2) IS
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.REQUEST(
    lockhandle => get_handle(lock_name),
    lockmode => DBMS_LOCK.X_MODE, -- eXclusive
    timeout => DBMS_LOCK.MAXWAIT, -- wait forever
    release_on_commit => FALSE);
  CASE lock_status
    WHEN 0 THEN NULL;
    WHEN 2 THEN RAISE_APPLICATION_ERROR(-20000,'deadlock detected');
    WHEN 4 THEN RAISE_APPLICATION_ERROR(-20000,'lock already obtained');
    ELSE RAISE_APPLICATION_ERROR(-20000,'request lock failed: ' || lock_status);
  END CASE;
END request_lock;

-- wrapper to release a lock
PROCEDURE release_lock (lock_name IN VARCHAR2) IS
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.RELEASE(
    lockhandle => get_handle(lock_name));
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'release lock failed: ' || lock_status);
  END IF;
END release_lock;

--example calling code
...
BEGIN
  request_lock('MYLOCK1');
  -- do some stuff
  release_lock('MYLOCK1');
END;
...

This worked perfectly, even from triggers. I even tried it out with multiple sessions:

Session 1:

exec request_lock('JEFF');

Session 2:

exec request_lock('JEFF');

At this point, session 2 just hung – it was waiting for Session 1 to release the lock.

Session 1:

exec release_lock('JEFF');

Now session 2 returned and had the lock. The same thing happened if I just closed down session 1.

If I changed request_lock to set timeout to some value (e.g. 3 seconds), session 2 would wait for that amount of time, then raise my error message “request lock failed – 1” (1 is the status code when a lock request times out).

I tried it with a deadlock situation, with timeout = MAXWAIT:

Session 1:

exec request_lock('MYLOCKA');

Session 2:

exec request_lock('MYLOCKB');

Session 1:

exec request_lock('MYLOCKB');

At this point session 1 hangs as it waits for session 2 to release the lock.

Session 2:

exec request_lock('MYLOCKA');

This is a deadlock. Both sessions are waiting for a lock held by the other session. After a few seconds Oracle automatically detects the deadlock and picks one session (at random, apparently) and it returns the lock status 2 (my code raises the error message “deadlock detected”). The other session stayed hung, however – it was still waiting, but there was no deadlock.

I tried the above scenario again, but with timeout = 5 seconds. If I executed the request_locks quick enough, I got “deadlock detected” on one session, and the other session eventually timed out.

So, everything works as expected, and the Java guys can hammer away at my views with impunity.

UPDATE for Oracle 12.1 and later:

The get_handle function no longer needs to be autonomous, since DBMS_LOCK now provides an autonomous variant of ALLOCATE_UNIQUE:

FUNCTION get_handle (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE_AUTONOMOUS (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  RETURN lock_handle;
END get_handle;

https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_LOCK.html#GUID-0D6C0622-05CA-4259-8059-330D512E1574

(thanks to Kim Berg Hansen for the update)


SQL problem

Came across this SQL problem in Eddie Awad’s blog, where he gives a solution to a problem posted to the Oracle-l mailing list.

I thought, this would be so much easier to solve if we were allowed to model the complete problem domain in the database, e.g.:

— create the data set

SQL> create table t (id number);
Table created.

SQL> insert into t values (1);
1 row created.

SQL> insert into t values (2);
1 row created.

SQL> select * from t;
ID
———-
1
2

— create the query set

SQL> create table u (id number);
Table created.

SQL> insert into u values (1);
1 row created.

SQL> insert into u values (2);
1 row created.

SQL> insert into u values (3);
1 row created.

SQL> select * from u;
ID
———-
1
2
3

SQL> select * from t
2 where id in (select id from u)
3 and not exists (select id from u minus select id from t);
no rows selected

— correct: the complete query set was not found in the data set

SQL> insert into t values (3);
1 row created.

SQL> select * from t
2 where id in (select id from u)
3 and not exists (select id from u minus select id from t);
ID
———-
1
2
3

— correct: the complete query set was found in the data set

The query could be further improved by changing the “where id in ()” to a “where exists ()”.

To make the query table usable by multiple sessions simultaneously, it can be created as a global temporary table (and populated whenever a query is needed), or a “group” distinguisher could be added to it.


Instead-of-Delete trigger problem

In software development, quite often the problem you encounter has virtually nothing to do with the thing you’re trying to do.

I’m creating some views which will provide an interface to a database. The views will have Instead-of triggers so that all DML will be passed to my own (packaged) procedures.

I’ve got a particular view which is a join between a base table, filtered by some criteria, and a code/description mapping view. I’ve implemented instead-of-insert, update and delete triggers that do the appropriate DML on the base table.

Inserts and updates work fine, but when deleting from the view I’m getting “TM contention” wait events. When deleting across a database link the session gets “ORA-02049: timeout: distributed transaction waiting for lock” after a long delay. If the delete is issued within the same database as the view, the session waits indefinitely for the lock.

I had a look at the view and noticed that it is a key-preserved view, so I theorised that maybe the session was locking the row before running my instead-of-delete trigger that does the delete; the trigger tries to delete the row that has already been locked. Doesn’t make sense. When I remove the instead-of-delete trigger however, I get “ORA-01752: cannot delete from view without exactly one key-preserved table”. So I need the trigger to handle the delete.

Looked up the Oracle doc “Instance Tuning Using Performance Views” about “TM enqueue”: “The most common reason for waits on TM locks tend to involve foreign key constraints where the constrained columns are not indexed. Index the foreign key columns to avoid this problem.” I saw this ref come up in a search early on but I initially dismissed it because my problem wasn’t related to performance. I’m glad I took a second look.

Righty-o. Looked at the table I was deleting from, and lo and behold there is another table with a fk to my table with no index on the fk column (mind you, there was an index that contained that column along with another column, obviously the database wasn’t going to use that to avoid the table lock).

When I created an index on the fk column, the problem was resolved. Brilliant!

The application never needed to delete from that table before, so this had never been encountered. In this case adding the index solved the problem. It’s on a very small, rarely-used table so the index won’t have a negative impact. Took about half an hour to work out, but I learned a bit more about Oracle, so that’s good.


Upgraded to HTMLDB 2.0

Upgraded from 1.6. I’ve already done some playing on 2.0 on htmldb.oracle.com, already liked the little interface and management improvements.

I’m not too interested in all the bells and whistles – I just want to see first if I can produce usable maintenance forms for my preferred flavour of tables. I’ve been using the wizards to create all my pages, but have come across problems whenever the tables use natural PKs (specifically, when I try to make master-detail pages) – so far most problems have been resolved by adding a PK populated by a sequence from a trigger. Not what I really want to do; I’m hoping that I can find a way to get HTMLDB to work with natural-pk tables.

The main disadvantage to using surrogate keys I’ve found is practical – apart from the obvious one that I have to ensure that a unique constraint is on the natural key (just have to be careful to do it) – but then these surrogate keys get propogated throughout my design (especially when I have modelled multiple inheritance with table-per-subclass, which is my personal preference) and writing queries and certain types of constraints can become tricky.

So far my impression of HTMLDB is that it is a nice simple product – but hopefully not so simplistic as that it requires surrogate keys everywhere?

Another problem that I’ll investigate is HTMLDB’s behaviour in regards to checking for record changes prior to updates; I’ve gotten the “can’t update because the hash values differ” error quite often when I’ve based a page on an updateable view. I use updateable views a lot as a way of hiding the database complexity from the application, and I don’t want to have to (a) denormalise my design just to suit HTMLDB, or (b) try to design HTMLDB to work with each entity separately.


Experimenting with Oracle

I’ll post here results from some experiments I’m doing with Oracle. I want to explore the implications and possible benefits of mapping a truly relational logical database model directly to the physical model. In other words:

  1. Every entity mapped to a “base table”
  2. Every base table has one or more Unique keys (one arbitrarily chosen as “Primary”)
  3. No surrogate keys, except where the model doesn’t include any natural key
  4. All columns mandatory
  5. Application interfaces via fully updateable views (materialized or not)

In particular, I would like to determine how (if possible) to implement various types of table and database constraints apart from the standard PK and FK constraints.

Initial experiments indicate most of this is very possible, and with careful coding can be made to work. In fact, I’m sure some advantages over common practice will be demonstrated.

I’m doing all this on Oracle 10g Release 2 which I downloaded recently. I’ve installed it on my P4 with XP Home. For the frontend I’m using HtmlDB 1.6, although I’ll probably upgrade to HtmlDB 2.0 soon.

I should note that I don’t seriously expect this approach to have enough merit to be used in real projects – but this is not because of a lack in the relational model of data but because of the poor support of it by the SQL standard (and therefore, by DBMS’s that implement it). Therefore, I expect that to get all this to work I’ll have to write a lot of code very carefully, but that the resulting database will be very easy to write applications and queries for. For example, implementing all constraints on the database for a particular entity means that I don’t need complicated constraint checking code in every module that works with that entity.

I expect I may have to wrestle with row locking and handling constraint violations correctly in the presence of multiple users/sessions; I don’t want to have to enforce serialized access to get this to work because of the scalability problem.

I’ve found it interesting so far, anyway.