Tag: data-modelling

Non-Overlapping Dates Constraint

If you have a table that represents time-varying info, e.g. with From and To date/time columns, you have a few options with regards to the problem of overlapping dates:

1. Check for overlapping dates in the application layer.

2. Use an off-the-shelf product to generate the appropriate triggers, e.g. Oracle CDM*RuleFrame or Toon Koppelaars’  RuleGen.

3. Roll your own, in the database.

4. Use a different data model that can use a unique constraint.

5. Forget about it.

One reason it’s difficult is that this is an example of a cross-row constraint, i.e. one that cannot merely be checked for the current row by itself. Oracle supports a few cross-row constraints, i.e. Primary Key, Unique and Foreign Key constraints; but it doesn’t natively support arbitrary assertions, which would allow us to easily declare this sort of constraint.

The real challenge comes from the fact that Oracle is a multi-user system and concurrent sessions cannot see the uncommitted data from other sessions; so some form of serialization will be required to ensure that when one session wishes to insert/update the data for a particular entity, no other session is allowed to start working on the same entity until the first session commits (or issues a rollback).

The problem is not new; it’s been around for a long time, and tripped many a new (and old) programmer.

There are two problems with option #1 (code in the application layer): firstly, you have to repeat the code for each different type of client (e.g. you might have a Java UI on one side, as well as some batch script somewhere else); secondly, usually the programmer involved will not understand the concurrency problem mentioned above and will not take it into account.

Option #2 is probably the best, most of the time. The solution is implemented at the database level, and is more likely to work correctly and efficiently.

Option #4 (change the data model) involves not storing the From and To dates, but instead dividing up all time ranges into discrete chunks, and each row represents a single chunk of time. This solution is valid if the desired booking ranges are at predictable discrete ranges, e.g. daily. You can then use an ordinary unique constraint to ensure that each chunk of time is only booked by one entity at any one time. This is the solution described here.

Option #5 (forget about it) is also a viable option, in my opinion. Basically it entails designing the rest of the application around the fact that overlapping date ranges might exist in the table – e.g. a report might simply merge the date ranges together prior to output.

Option #3, where you implement the triggers yourself on the database, has the same advantage as Option #2, where it doesn’t matter which application the data is coming from, the constraint will hold true. However, you have to be really careful because it’s much easier to get it wrong than it is to get right, due to concurrency.

I hear you scoffing, “Triggers?!?”. I won’t comment except to refer you to this opinion, which I couldn’t say it better myself: The fourth use-case for Triggers.

There is another Option #3 using a materialized view instead of triggers; I’ll describe this alternative at the end of this post.

So, here is a small example showing how an overlapping date constraint may be implemented. It is intentionally simple to illustrate the approach: it assumes that the From and To dates cannot be NULL, and its rule for detecting overlapping dates requires that the dates not overlap at all, to the nearest second.

  1. Create the tables
    CREATE TABLE room
      (room_no NUMBER NOT NULL
      ,CONSTRAINT room_pk PRIMARY KEY (room_no)
      );
    
    CREATE TABLE room_booking
      (room_no NUMBER NOT NULL
      ,booked_from DATE NOT NULL
      ,booked_to DATE NOT NULL
      ,CONSTRAINT room_booking_pk
         PRIMARY KEY (room_no, booked_from)
      ,CONSTRAINT room_booking_fk
         FOREIGN KEY (room_no) REFERENCES room (room_no)
      );
    

  2. Create the validation trigger (note – I’ve used an Oracle 11g compound trigger here, but it could easily be rewritten for earlier versions by using two triggers + a database package):
    CREATE OR REPLACE TRIGGER room_booking_trg
      FOR INSERT OR UPDATE OF room_no, booked_from, booked_to
        ON room_booking
      COMPOUND TRIGGER
    
      TYPE room_no_array IS TABLE OF CHAR(1)
        INDEX BY BINARY_INTEGER;
    
      room_nos room_no_array;
    
    PROCEDURE lock_room (room_no IN room.room_no%TYPE) IS
      dummy room.room_no%TYPE;
    BEGIN
      SELECT r.room_no
      INTO dummy
      FROM room r
      WHERE r.room_no = lock_room.room_no
      FOR UPDATE;
    END lock_room;
    
    PROCEDURE validate_room (room_no IN room.room_no%TYPE) IS
      overlapping_booking EXCEPTION;
      dummy CHAR(1);
    BEGIN
      -- check for overlapping date/time ranges
      BEGIN
        SELECT 'X' INTO dummy
        FROM room_booking rb1
            ,room_booking rb2
        WHERE rb1.room_no = validate_room.room_no
        AND rb2.room_no = validate_room.room_no
        AND rb1.booked_from != rb2.booked_from
        AND (
             rb1.booked_from BETWEEN rb2.booked_from
                                 AND rb2.booked_to
             OR
             rb1.booked_to BETWEEN rb2.booked_from
                               AND rb2.booked_to
            )
        AND ROWNUM = 1;
        RAISE overlapping_booking;
      EXCEPTION
        WHEN NO_DATA_FOUND THEN
          -- good, no constraint violations
          NULL;
      END;
    EXCEPTION
      WHEN overlapping_booking THEN
        RAISE_APPLICATION_ERROR(-20000,
          'Overlapping booking for room #' || room_no);
    END validate_room;
    
    PROCEDURE validate_rooms IS
      room_no room.room_no%TYPE;
    BEGIN
      room_no := room_nos.FIRST;
      LOOP
        EXIT WHEN room_no IS NULL;
        validate_room (room_no);
        room_no := room_nos.NEXT(room_no);
      END LOOP;
      room_nos.DELETE;
    EXCEPTION
      WHEN OTHERS THEN
        room_nos.DELETE;
        RAISE;
    END validate_rooms;
    
    BEFORE EACH ROW IS
    BEGIN
      -- lock the header record (so other sessions
      -- can't modify the bookings for this room
      -- at the same time)
      lock_room(:NEW.room_no);
      -- remember the room_no to validate later
      room_nos(:NEW.room_no) := 'Y';
    END BEFORE EACH ROW;
    
    AFTER STATEMENT IS
    BEGIN
      validate_rooms;
    END AFTER STATEMENT;
    
    END room_booking_trg;
    /
    

That’s all you need. The trigger locks the header record for the room, so only one session can modify the bookings for a particular room at any one time. If you don’t have a table like “room” in your database that you can use for this purpose, you could use DBMS_LOCK instead (similarly to that proposed in the OTN forum discussion here).

It would not be difficult to adapt this example for alternative requirements, e.g. where the From and To dates may be NULL, or where the overlapping criteria should allow date/time ranges that coincide at their endpoints (e.g. so that the date ranges (1-Feb-2000 to 2-Feb-2000) and (2-Feb-2000 to 3-Feb-2000) would not be considered to overlap). You’d just need to modify the comparison in the query in validate_room to take these requirements into account.

Test case #1

INSERT INTO room (room_no) VALUES (101);
INSERT INTO room (room_no) VALUES (201);
INSERT INTO room (room_no) VALUES (301);

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (101, DATE '2000-01-01', DATE '2000-01-02' - 0.00001);

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (101, DATE '2000-01-02', DATE '2000-01-03' - 0.00001);

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (201, DATE '2000-02-01', DATE '2000-02-05' - 0.00001);

Expected: no errors

Test case #2

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (201, DATE '2000-02-01', DATE '2000-02-02' - 0.00001);

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (201, DATE '2000-02-02', DATE '2000-02-04' - 0.00001);

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (201, DATE '2000-02-03', DATE '2000-02-05' - 0.00001);

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (201, DATE '2000-02-03', DATE '2000-02-06' - 0.00001);

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (201, DATE '2000-01-31', DATE '2000-02-01');

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (201, DATE '2000-01-31', DATE '2000-02-02' - 0.00001);

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (201, DATE '2000-01-31', DATE '2000-02-06' - 0.00001);

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (201, DATE '2000-02-05' - 0.00001, DATE '2000-02-06' - 0.00001);

UPDATE room_booking SET booked_to = '2000-01-02' - 0.00001
WHERE room_no = 101 AND booked_from = DATE '2000-01-02';

Expected: constraint violation on each statement

Test case #3

in session #1:

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (301, DATE '2000-01-01', DATE '2000-02-01' - 0.00001);

in session #2:

INSERT INTO room_booking (room_no, booked_from, booked_to)
VALUES (301, DATE '2000-01-15', DATE '2000-01-16' - 0.00001);

Expected: session #2 will wait until session #1 issues a COMMIT or ROLLBACK. If session #1 COMMITs, session #2 will then report a constraint violation. If session #2 rolls back, session #2 will complete without error.

The No-Trigger option #3: Materialized View

This is similar to a solution proposed by Rob Van Wijk. It uses a constraint on a materialized view to stop overlapping date ranges.

So, instead of the trigger, you would do something like this:

CREATE MATERIALIZED VIEW LOG ON room_booking WITH ROWID;

CREATE MATERIALIZED VIEW room_booking_date_ranges
  REFRESH FORCE ON COMMIT
  AS SELECT 'X' AS dummy
     FROM room_booking rb1
         ,room_booking rb2
     WHERE rb1.room_no = rb2.room_no
     AND rb1.booked_from != rb2.booked_from
     AND (
          rb1.booked_from BETWEEN rb2.booked_from
                              AND rb2.booked_to
          OR
          rb1.booked_to BETWEEN rb2.booked_from
                            AND rb2.booked_to
         );

ALTER TABLE room_booking_date_ranges
  ADD CONSTRAINT no_overlapping_dates_ck
  CHECK ( dummy = 'Z' );

The nice thing about this solution is that it is simpler to code, and seems more “declarative” in nature. Also, you don’t have to worry about concurrency at all.

The constraint is checked at COMMIT-time when the materialized view is refreshed; so it behaves like a deferred constraint, which may be an advantage for some scenarios.

I believe it may perform better than the trigger-based option when large volumes of data are inserted or updated; however it may perform worse than the trigger-based option when you have lots of small transactions. This is because, unfortunately, the query here cannot be a “REFRESH FAST ON COMMIT” (if you know how this could be changed into a REFRESH FAST MV, please let me know!).

 

What do you think? If you see any potential issues with the above solutions please feel free to comment.

EDIT 30/8: added some more test cases

Fabian Pascal is back and Debunking again

This book takes pride of place on my bookshelf. Highly recommended reading for anyone in the database industry.

If you haven’t seen Fabian Pascal’s blog before, it’s because he’s only just started it – but he’ll be publishing new material, as well as articles previously published at Database Debunkings, infamous for his fundamental, no-holds-barred, uncompromising take on what the Relational Model is, what it isn’t, and what that means for all professionals who design databases.

It was with sadness that I saw the site go relatively static over the past years, and to see it being revived is a fresh blast of cool air in a world that continues to be inundated by fads and misconceptions. Of particular note was the “THE VOCIFEROUS IGNORANCE HALL OF SHAME“… I’m looking forward to seeing the old vigorous debates that will no doubt be revived or rehashed.

The pure view of the Relational model of data is, perhaps, too idealistic for some – impractical for day-to-day use in a SQL-dominated world. Personally, I’ve found (although I cannot pretend to be an expert, in any sense, on this topic) that starting from a fundamentally pure model, unconstrained by physical limitations, conceived at an almost ideal, Platonic level, allows me to discover the simplest, most provably “correct” solution for any data modelling problem. At some stage I have to then “downgrade” it to a form that is convenient and pragmatic for implementation in a SQL database like Oracle; in spite of this, having that logical design in the back of my head helps to highlight potential inconsistencies or data integrity problems that must then be handled by the application.

That this situation is, in fact, not the best of all possible worlds, is something that we can all learn and learn again. Have a look, and see what you think: dbdebunk.blogspot.com.au.

Why EAV is bad, mostly; and why it is used so often

Tom Kyte offers some astute observations regarding EAV data models in general.

Interesting to see the justification for entity-attribute-value data models from their use in the design of Oracle Application Express. I’d say I’d have to agree – the site I’m at currently is chocas with name/value pairs, which works fine when querying individual bits of data for the UI, but works miserably for reporting and batch jobs.