Background: see my earlier posts Avoiding Lost Updates
and Avoiding Lost Updates #2: Updateable Views
I’ve discussed this problem with my colleagues and have come up with a number of potential solutions, one of which we believe is bulletproof.
The desired property of such a solution is to reduce the probability that an update of a view will leave the row with the same “version” number, to as close to zero as possible. A bulletproof solution would have a probability of exactly zero.
Solution #1:
Instead of taking the SUM of all the versions, shift the versions from the outer-joined tables (e.g. by multiplying by 10000, 100000, etc).
This reduces the likelihood of getting a false negative; but it doesn’t reduce it to zero, since it would still be theoretically possible to get an update that would cause compensating deletes and updates on the outer-joined tables.
Solution #2:
Use ORA_ROWSCN instead of version.
This also reduces the likelihood of a false negative; in our view, astronomically so; but with a bit of mental gymnastics we can think of a contrived scenario in which the total of ORA_ROWSCN stays the same.
Solution #3:
Cause an update on the view to always do at least one update of one of the non-outer-joined tables (even if it is updating a column to the same value); and don’t include versions from outer-joined tables in the summed version.
This means that version will now always increase whenever an update occurs; therefore, the probability is zero, and the solution is bullet-proof!
Thanks to Howard and Maciej for their help.
This is a followup to my earlier post “Avoiding Lost Updates: Protecting Data in a Multi-user environment”.
In this post I’ll talk about how this problem can be solved when you’re creating updateable views. Now, if you read the previous post, you’ll know the five options that I thought of (plus a sixth pointed out by Gary). For updateable views, if you want to protect against lost updates (in other words, “Last Update Wins” is not an acceptable solution), you can still use “Column Compare” and “Hash Compare” without any changes.
For the “Timestamp Compare”, “Version Compare” and Gary’s ORA_ROWSCN options, however, you’ll need to bring some data from the base tables up to the view’s SELECT clause.
For “Timestamp Compare” and ORA_ROWSCN, you need to get the latest of the timestamps/SCNs. You could use the GREATEST function for this (as long as ORA_ROWSCN is always greater for later transactions?).
It’s just as simple when you’re using “Version Compare”. It was pointed out by a colleague of mine that all you have to do is present the SUM of all the version columns from each base table that can be updateable via the view. This way, if any or all of the base tables experience changes, the “version” column in the view will increment by 1 or more. Most of the time, that is!
Let’s consider an example. We have the following imaginatively named base tables with a view on them:
CREATE TABLE table_a (
id NUMBER(12) PRIMARY KEY,
t VARCHAR2(10),
version NUMBER(38) NOT NULL);
CREATE TABLE table_b (
id NUMBER(12) PRIMARY KEY,
u VARCHAR2(30) NOT NULL,
version NUMBER(38) NOT NULL );
CREATE TABLE table_c (
id NUMBER(12) PRIMARY KEY,
v VARCHAR2(10) NOT NULL,
version NUMBER(38) NOT NULL );
CREATE TRIGGER table_a_ins_upd_row BEFORE INSERT OR UPDATE ON table_a FOR EACH ROW
BEGIN
IF inserting THEN :NEW.version := 1; ELSE :NEW.version := :OLD.version + 1; END IF;
END;
/
CREATE TRIGGER table_b_ins_upd_row BEFORE INSERT OR UPDATE ON table_b FOR EACH ROW
BEGIN
IF inserting THEN :NEW.version := 1; ELSE :NEW.version := :OLD.version + 1; END IF;
END;
/
CREATE TRIGGER table_c_ins_upd_row BEFORE INSERT OR UPDATE ON table_c FOR EACH ROW
BEGIN
IF inserting THEN :NEW.version := 1; ELSE :NEW.version := :OLD.version + 1; END IF;
END;
/
CREATE VIEW view_ab AS
SELECT table_a.id, table_a.t, table_b.u, table_c.v, table_a.version + table_b.version + table_c.version version
FROM table_a
LEFT OUTER JOIN table_b ON table_a.id = table_b.id
LEFT OUTER JOIN table_c ON table_a.id = table_c.id;
The view therefore shows all rows in table_a, as well as the corresponding rows in table_b and table_c if they exist.
Assume that the view_ab has instead-of-insert and update triggers that do the following:
Insert/update id and t on table_a.
If “u” is inserted or updated with a value, do the corresponding insert or update on table_b.
If “u” is cleared (set to NULL), delete it from table_b.
And the same for “v” in relation to table_c.
If I do this:
INSERT INTO view_ab (id, t, u, v) VALUES (1, 'A', 'B', 'C');
The view should have (1, ‘A’, ‘B’, ‘C’, 3). Version = 3 because we have one row in each of the three tables each with version = 1.
Now if I instead did:
INSERT INTO view_ab (id, t, u) VALUES (1, 'A', 'B');
The view should have the row (1, ‘A’, ‘B’, null, 2). Version = 2 because we have rows in table_a and table_b with version = 1, but no row in table_c.
The only purpose of the version column is to change if any insert, update or delete occurs on any of the base tables.
What if I do the following:
UPDATE view_ab SET u = NULL, v = 'C' where id = 1;
What will the view have? It will have (1, ‘A’, null, ‘C’, 2). Hang on there – I have done a change to the row, but the version has stayed the same! Why? Because the update trigger had to delete from table_b (reducing the total version to 1) and insert into table_c (increasing the total version back to 2).
How to solve this? This is answered in my next post.
There is a particular problem with Oracle and other databases where access to data is not serialized (by default), and there are a number of ways this problem is generally dealt with. The problem crops up when you have many users, some of whom may wish to update the same data at around the same time. The process that takes place when a user wishes to update a row is:
Session #1:
- Read the data from the database into the client memory.
- Make the changes in the client memory.
- Write the changes from the client memory to the database.
Between steps 1 and 2 another session (#2) might read the same data (unless you’re in a particularly poorly designed database in which writers block readers; e.g. one database I helped to refactor did just that – it issued exclusive locks on rows every time they were read into a form(!) – this was the first thing we got rid of in the new system), and modify it.
The row isn’t locked, so the other user makes their changes and clicks their “Save” button which commits the changes to the database. Meanwhile, session #1 finally gets to step 3 and saves their changes. If no checks happened, they would overwrite the changes made by session #2.
To illustrate:
- Session #1 reads the row “Joe Bloggs”.
- Session #2 reads the row “Joe Bloggs”.
- Session #2 changes the row to “Jane Bloggs” and saves.
– now, Session #1 is still looking at “Joe Bloggs” on their screen.
- Session #1 changes the row to “Joe Jones” and saves.
The problem is that Session #1 has unwittingly undone the change that Session #2 made.
There are a number of ways this problems is approached that I know of:
- “Last Update Wins” – just allow the update.
- “Column Compare” – all the columns to be updated are compared when updating, and if any are different to the values originally read, an error is returned (e.g. Oracle Forms does this and issues “FRM-40654: Record has been updated by another user. Re-query to see change.”).
- “Hash Compare” – this is a variant on the “Column Compare” method used by HTMLDB. It compares a hash of the combined data for the row.
- “Timestamp Compare” – the table has a “timestamp” column, which is automatically set to the last date/time when the row was inserted or most recently updated. An update will only succeed if the timestamp of the updated row is the same as the timestamp originally read.
- “Version Compare” – the table has a “version” column, which is set to 1 on insert and incremented on update. An update will only succeed if the version of the updated row is the same as the version originally read.
Note: all the “Compare” approaches above do not necessarily require a second query-and-lock on the row prior to the update; the best way to implement these is to add the compare as a predicate for the update – then, if no rows are updated, the caller can tell that the update failed (because the row had been updated or deleted by another session).
Comparison
- “Last Update Wins” – This approach may be acceptable for some applications.
- “Column Compare” – If only the updated columns are compared, this method means that two sessions can modify different columns in the same row independently.
- “Hash Compare” – This method stops two sessions modifying the same row, even if they’re touching different sets of columns.
- “Timestamp Compare” – This method requires the timestamp to have sufficient granularity so that a read by one session, followed by an update by another session, will not occur at exactly the same timestamp. Generally milliseconds or microseconds are needed at least, although even this granularity might not be sufficient for some situations. This method also stops two sessions modifying the same row, even if they’re touching different sets of columns.
- “Version Compare” – This is a favourite for Java developers and works very well. It is simple to code, computationally cheap, and doesn’t suffer from the timestamp granularity problem. This method also stops two sessions modifying the same row, even if they’re touching different sets of columns.
From the above comments you might think I prefer #5 (Version Compare) but I think it can depend on the requirements of the system you’re working with. Of course, if you’re using Forms or HTMLDB you don’t need to worry about it too much because it’s been largely taken care of for you.
In my next entry I describe how this works when you’re working with updateable views.
Further Reading