If you have a table that represents time-varying info, e.g. with From and To date/time columns, you have a few options with regards to the problem of overlapping dates:
1. Check for overlapping dates in the application layer.
2. Use an off-the-shelf product to generate the appropriate triggers, e.g. Oracle CDM*RuleFrame or Toon Koppelaars’ RuleGen.
3. Roll your own, in the database.
4. Use a different data model that can use a unique constraint.
5. Forget about it.
One reason it’s difficult is that this is an example of a cross-row constraint, i.e. one that cannot merely be checked for the current row by itself. Oracle supports a few cross-row constraints, i.e. Primary Key, Unique and Foreign Key constraints; but it doesn’t natively support arbitrary assertions, which would allow us to easily declare this sort of constraint.
The real challenge comes from the fact that Oracle is a multi-user system and concurrent sessions cannot see the uncommitted data from other sessions; so some form of serialization will be required to ensure that when one session wishes to insert/update the data for a particular entity, no other session is allowed to start working on the same entity until the first session commits (or issues a rollback).
The problem is not new; it’s been around for a long time, and tripped many a new (and old) programmer.
There are two problems with option #1 (code in the application layer): firstly, you have to repeat the code for each different type of client (e.g. you might have a Java UI on one side, as well as some batch script somewhere else); secondly, usually the programmer involved will not understand the concurrency problem mentioned above and will not take it into account.
Option #2 is probably the best, most of the time. The solution is implemented at the database level, and is more likely to work correctly and efficiently.
Option #4 (change the data model) involves not storing the From and To dates, but instead dividing up all time ranges into discrete chunks, and each row represents a single chunk of time. This solution is valid if the desired booking ranges are at predictable discrete ranges, e.g. daily. You can then use an ordinary unique constraint to ensure that each chunk of time is only booked by one entity at any one time. This is the solution described here.
Option #5 (forget about it) is also a viable option, in my opinion. Basically it entails designing the rest of the application around the fact that overlapping date ranges might exist in the table – e.g. a report might simply merge the date ranges together prior to output.
Option #3, where you implement the triggers yourself on the database, has the same advantage as Option #2, where it doesn’t matter which application the data is coming from, the constraint will hold true. However, you have to be really careful because it’s much easier to get it wrong than it is to get right, due to concurrency.
I hear you scoffing, “Triggers?!?”. I won’t comment except to refer you to this opinion, which I couldn’t say it better myself: The fourth use-case for Triggers.
There is another Option #3 using a materialized view instead of triggers; I’ll describe this alternative at the end of this post.
So, here is a small example showing how an overlapping date constraint may be implemented. It is intentionally simple to illustrate the approach: it assumes that the From and To dates cannot be NULL, and its rule for detecting overlapping dates requires that the dates not overlap at all, to the nearest second.
1. Create the tables
CREATE TABLE room (room_no NUMBER NOT NULL ,CONSTRAINT room_pk PRIMARY KEY (room_no) ); CREATE TABLE room_booking (room_no NUMBER NOT NULL ,booked_from DATE NOT NULL ,booked_to DATE NOT NULL ,CONSTRAINT room_booking_pk PRIMARY KEY (room_no, booked_from) ,CONSTRAINT room_booking_fk FOREIGN KEY (room_no) REFERENCES room (room_no) );
2. Create the validation trigger (note – I’ve used an Oracle 11g compound trigger here, but it could easily be rewritten for earlier versions by using two triggers + a database package):
CREATE OR REPLACE TRIGGER room_booking_trg FOR INSERT OR UPDATE OF room_no, booked_from, booked_to ON room_booking COMPOUND TRIGGER TYPE room_no_array IS TABLE OF CHAR(1) INDEX BY BINARY_INTEGER; room_nos room_no_array; PROCEDURE lock_room (room_no IN room.room_no%TYPE) IS dummy room.room_no%TYPE; BEGIN SELECT r.room_no INTO dummy FROM room r WHERE r.room_no = lock_room.room_no FOR UPDATE; END lock_room; PROCEDURE validate_room (room_no IN room.room_no%TYPE) IS overlapping_booking EXCEPTION; dummy CHAR(1); BEGIN -- check for overlapping date/time ranges BEGIN SELECT 'X' INTO dummy FROM room_booking rb1 ,room_booking rb2 WHERE rb1.room_no = validate_room.room_no AND rb2.room_no = validate_room.room_no AND rb1.booked_from != rb2.booked_from AND ( rb1.booked_from BETWEEN rb2.booked_from AND rb2.booked_to OR rb1.booked_to BETWEEN rb2.booked_from AND rb2.booked_to ) AND ROWNUM = 1; RAISE overlapping_booking; EXCEPTION WHEN NO_DATA_FOUND THEN -- good, no constraint violations NULL; END; EXCEPTION WHEN overlapping_booking THEN RAISE_APPLICATION_ERROR(-20000, 'Overlapping booking for room #' || room_no); END validate_room; PROCEDURE validate_rooms IS room_no room.room_no%TYPE; BEGIN room_no := room_nos.FIRST; LOOP EXIT WHEN room_no IS NULL; validate_room (room_no); room_no := room_nos.NEXT(room_no); END LOOP; room_nos.DELETE; EXCEPTION WHEN OTHERS THEN room_nos.DELETE; RAISE; END validate_rooms; BEFORE EACH ROW IS BEGIN -- lock the header record (so other sessions -- can't modify the bookings for this room -- at the same time) lock_room(:NEW.room_no); -- remember the room_no to validate later room_nos(:NEW.room_no) := 'Y'; END BEFORE EACH ROW; AFTER STATEMENT IS BEGIN validate_rooms; END AFTER STATEMENT; END room_booking_trg; /
That’s all you need. The trigger locks the header record for the room, so only one session can modify the bookings for a particular room at any one time. If you don’t have a table like “room” in your database that you can use for this purpose, you could use DBMS_LOCK instead (similarly to that proposed in the OTN forum discussion here).
It would not be difficult to adapt this example for alternative requirements, e.g. where the From and To dates may be NULL, or where the overlapping criteria should allow date/time ranges that coincide at their endpoints (e.g. so that the date ranges (1-Feb-2000 to 2-Feb-2000) and (2-Feb-2000 to 3-Feb-2000) would not be considered to overlap). You’d just need to modify the comparison in the query in validate_room to take these requirements into account.
Test case #1
INSERT INTO room (room_no) VALUES (101); INSERT INTO room (room_no) VALUES (201); INSERT INTO room (room_no) VALUES (301); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (101, DATE '2000-01-01', DATE '2000-01-02' - 0.00001); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (101, DATE '2000-01-02', DATE '2000-01-03' - 0.00001); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (201, DATE '2000-02-01', DATE '2000-02-05' - 0.00001);
Expected: no errors
Test case #2
INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (201, DATE '2000-02-01', DATE '2000-02-02' - 0.00001); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (201, DATE '2000-02-02', DATE '2000-02-04' - 0.00001); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (201, DATE '2000-02-03', DATE '2000-02-05' - 0.00001); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (201, DATE '2000-02-03', DATE '2000-02-06' - 0.00001); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (201, DATE '2000-01-31', DATE '2000-02-01'); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (201, DATE '2000-01-31', DATE '2000-02-02' - 0.00001); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (201, DATE '2000-01-31', DATE '2000-02-06' - 0.00001); INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (201, DATE '2000-02-05' - 0.00001, DATE '2000-02-06' - 0.00001); UPDATE room_booking SET booked_to = '2000-01-02' - 0.00001 WHERE room_no = 101 AND booked_from = DATE '2000-01-02';
Expected: constraint violation on each statement
Test case #3
in session #1:
INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (301, DATE '2000-01-01', DATE '2000-02-01' - 0.00001);
in session #2:
INSERT INTO room_booking (room_no, booked_from, booked_to) VALUES (301, DATE '2000-01-15', DATE '2000-01-16' - 0.00001);
Expected: session #2 will wait until session #1 issues a COMMIT or ROLLBACK. If session #1 COMMITs, session #2 will then report a constraint violation. If session #2 rolls back, session #2 will complete without error.
The No-Trigger option #3: Materialized View
This is similar to a solution proposed by Rob Van Wijk. It uses a constraint on a materialized view to stop overlapping date ranges.
So, instead of the trigger, you would do something like this:
CREATE MATERIALIZED VIEW LOG ON room_booking WITH ROWID; CREATE MATERIALIZED VIEW room_booking_date_ranges REFRESH FORCE ON COMMIT AS SELECT 'X' AS dummy FROM room_booking rb1 ,room_booking rb2 WHERE rb1.room_no = rb2.room_no AND rb1.booked_from != rb2.booked_from AND ( rb1.booked_from BETWEEN rb2.booked_from AND rb2.booked_to OR rb1.booked_to BETWEEN rb2.booked_from AND rb2.booked_to ); ALTER TABLE room_booking_date_ranges ADD CONSTRAINT no_overlapping_dates_ck CHECK ( dummy = 'Z' );
The nice thing about this solution is that it is simpler to code, and seems more “declarative” in nature. Also, you don’t have to worry about concurrency at all.
The constraint is checked at COMMIT-time when the materialized view is refreshed; so it behaves like a deferred constraint, which may be an advantage for some scenarios.
I believe it may perform better than the trigger-based option when large volumes of data are inserted or updated; however it may perform worse than the trigger-based option when you have lots of small transactions. This is because, unfortunately, the query here cannot be a “REFRESH FAST ON COMMIT” (if you know how this could be changed into a REFRESH FAST MV, please let me know!).
What do you think? If you see any potential issues with the above solutions please feel free to comment.
EDIT 30/8: added some more test cases
If you haven’t seen Fabian Pascal’s blog before, it’s because he’s only just started it – but he’ll be publishing new material, as well as articles previously published at Database Debunkings, infamous for his fundamental, no-holds-barred, uncompromising take on what the Relational Model is, what it isn’t, and what that means for all professionals who design databases.
It was with sadness that I saw the site go relatively static over the past years, and to see it being revived is a fresh blast of cool air in a world that continues to be inundated by fads and misconceptions. Of particular note was the “THE VOCIFEROUS IGNORANCE HALL OF SHAME“… I’m looking forward to seeing the old vigorous debates that will no doubt be revived or rehashed.
The pure view of the Relational model of data is, perhaps, too idealistic for some – impractical for day-to-day use in a SQL-dominated world. Personally, I’ve found (although I cannot pretend to be an expert, in any sense, on this topic) that starting from a fundamentally pure model, unconstrained by physical limitations, conceived at an almost ideal, Platonic level, allows me to discover the simplest, most provably “correct” solution for any data modelling problem. At some stage I have to then “downgrade” it to a form that is convenient and pragmatic for implementation in a SQL database like Oracle; in spite of this, having that logical design in the back of my head helps to highlight potential inconsistencies or data integrity problems that must then be handled by the application.
That this situation is, in fact, not the best of all possible worlds, is something that we can all learn and learn again. Have a look, and see what you think: dbdebunk.blogspot.com.au.
I recently saw this question on StackOverflow (“Is there any way to determine if a package has state in Oracle?”) which caught my attention.
You’re probably already aware that when a package is recompiled, any sessions that were using that package won’t even notice the change; unless that package has “state” – i.e. if the package has one or more package-level variables or constants. The current value of these variables and constants is kept in the PGA for each session; but if you recompile the package (or modify something on which the package depends), Oracle cannot know for certain whether the new version of the package needs to reset the values of the variables or not, so it errs on the side of caution and discards them. The next time the session tries to access the package in any way, Oracle will raise ORA-04068, and reset the package state. After that, the session can try again and it will work fine.
Side Note: There are a number of approaches to solving the ORA-04068 problem, some of which are given as answers to this question here. Not all of them are appropriate for every situation. Another approach not mentioned there is to avoid or minimize it – move all the package variables to a separate package, which hopefully will be invalidated less often.
It’s quite straightforward to tell whether a given package has “state” and thus has the potential for causing ORA-04068: look for any variables or constants declared in the package specification or body. If you have a lot of packages, however, you might want to get a listing of all of them. To do this, you can use the new PL/Scope feature introduced in Oracle 11g.
select object_name AS package, type, name AS variable_name from user_identifiers where object_type IN ('PACKAGE','PACKAGE BODY') and usage = 'DECLARATION' and type in ('VARIABLE','CONSTANT') and usage_context_id in ( select usage_id from user_identifiers where type = 'PACKAGE' );
If you have compiled the packages in the schema with PL/Scope on (i.e.
alter session set plscope_settings='IDENTIFIERS:ALL';), this query will list all the packages and the variables that mean they will potentially have state.
Before this question was raised, I hadn’t used PL/Scope for real; it was quite pleasing to see how easy it was to use to answer this particular question. This also illustrates a good reason why I like to hang out on Stackoverflow – it’s a great way to learn something new every day.
DATEis not a date.
DATE '2012-06-22'is not a date.
CURRENT_DATEis not a date.
SYSDATEis not a date.
TRUNC(SYSDATE)is not a date.
TO_DATE('22/06/2012','DD/MM/YYYY')is not a date.
Oracle SQL does not have a native date datatype.
Explanation: the datatype called “DATE” actually represents a Date+Time. It always has a time portion. For example, the literal expression
DATE '2012-06-22'has a time component that means midnight, to the nearest second. It is only by convention that a DATE with a time of 00:00:00 is used to represent the whole day, rather than exactly midnight; but this convention only works if the developer carefully writes his code to ensure this is true.
To an experienced Oracle developer, this is not a problem, it feels perfectly natural to say “DATE” but internally to be thinking in terms of time values. It’s easy to forget that we’re even doing this subconsciously. To an outsider (e.g. business analysts, testers, Java developers), it can be confusing – and it is often not immediately obvious that the confusion exists. It’s not until you start explaining why
SYSDATE <= DATE '2012-06-22' evaluates to FALSE, even if today is the 22nd day of the month of June in the year of our Lord 2012 that you realise they have been labouring under a false assumption: that a
DATE is a date, that it represents the full 24-hour period that a normal human would call “22 June 2012″.
If I was invited to change the name of just one thing in Oracle (and everyone was willing to make all the changes necessary to their code to accommodate my whim) it would be to change “DATE” to “DATETIME”.
I ask you: is there anything else in Oracle that confuses outsiders more often than this misnomer?
P.S. Now for a freebie: here is a summary of a number of transformations that may be done to remove the TRUNC function call around a date column (a and b are all of type DATE):
TRUNC(a) = TRUNC(b) => (a BETWEEN TRUNC(b) AND TRUNC(b)+0.99999) TRUNC(a) < TRUNC(b) => a < TRUNC(b) TRUNC(a) <= TRUNC(b) => a < TRUNC(b)+1 TRUNC(a) > TRUNC(b) => a >= TRUNC(b)+1 TRUNC(a) >= TRUNC(b) => a >= TRUNC(b) TRUNC(a) = b => (a BETWEEN b AND b+0.99999 AND b = TRUNC(b)) TRUNC(a) < b => (a < TRUNC(b)+1 AND NOT (a=TRUNC(a) AND b=TRUNC(b))) TRUNC(a) <= b => a < TRUNC(b)+1 TRUNC(a) > b => a >= TRUNC(b)+1 TRUNC(a) >= b => (a >= TRUNC(b) AND NOT (b=TRUNC(b)))
What’s the biggest clue you can give that your database is vulnerable to SQL injection? When your list of “forbidden words” looks suspiciously like a sample of SQL / PL/SQL keywords:
I notice that they haven’t forbidden BEGIN, CREATE, MERGE, or TRUNCATE …
Congressman Peters, your IT staff are doing it wrong.
1. Overly-generic parameter types
When you send an email, it will have at least one recipient, and it may have many recipients. However, no email will have more than one sender. Yet, I wrote the package procedure like this:
TYPE address_type IS RECORD (name VARCHAR2(100) ,email_address VARCHAR2(200) ); TYPE address_list_type IS TABLE OF address_type INDEX BY BINARY_INTEGER; PROCEDURE send (i_sender IN address_list_type ,i_recipients IN address_list_type ,i_subject IN VARCHAR2 ,i_message IN VARCHAR2 );
Why I didn’t have i_sender be a simple address_type, I can’t remember. Internally, the procedure only looks at i_sender(1) – if a caller were to pass in a table of more than one sender, it raises an exception.
2. Functional programming to avoid local variables
Simple is best, and there’s nothing wrong with using local variables. I wish I’d realised these facts when I wrote functions like this:
FUNCTION address (i_name IN VARCHAR2 ,i_email_address IN VARCHAR2 ) RETURN address_list_type; FUNCTION address (i_address IN address_list_type ,i_name IN VARCHAR2 ,i_email_address IN VARCHAR2 ) RETURN address_list_type;
All that so that callers can avoid *one local variable*:
EMAIL_PKG.send (i_sender => EMAIL_PKG.address('joe','email@example.com') ,i_recipients => EMAIL_PKG.address( EMAIL_PKG.address( 'jill', 'firstname.lastname@example.org') ,'bob', 'email@example.com') ,i_subject => 'hello' ,i_message => 'world' );
See what I did there with the recipients? Populating an array on the fly with just function calls. Smart eh? But rather useless, as it turns out; when we need to send multiple recipients, it’s usually populated within a loop of unknown sized, so this method doesn’t work anyway.
Go ahead – face your past and dig up some code you wrote 5 years ago or more. I think, if you don’t go “WTF!” every now and then, you probably haven’t learned anything or improved yourself in the intervening years. Just saying :)
I’m doing some research for an uncoming Oracle conference – to help, please answer this poll. Thank you!
Update: thanks to everyone who voted – it certainly looks like there is a significant number of places still running Oracle versions older than 10g. It is likely that many of these will be getting ready to upgrade or be decommissioned, but even then our 7/8i/9i database skills will be relevant for some time to come.
If you haven’t voted yet, the poll is still open.
Ever since I downloaded the Alexandria PL/SQL library, I haven’t been able to put it down. Just recently I decided I wanted to serve up a whole lot of media files directly from Amazon’s S3 simple storage service, instead of serving them from within my EC2 (elastic compute) instance. They were just wasting my linux server’s time responding to http requests.
So, I quickly wrote the following code to transfer them:
DECLARE l_blob BLOB; BEGIN /* initialise my AWS session */ ALEX.amazon_aws_auth_pkg.init ( 'yyy-my-aws-id-yyy' , 'xxx-not-telling-xxx' , p_gmt_offset => -8); FOR rec IN ( SELECT id, filename, mime_type, location FROM myfiles WHERE location = 'http://myserver/media/' ) LOOP /* read the file from its current location */ l_blob := ALEX.http_util_pkg.get_blob_from_url (rec.location || rec.filename); IF DBMS_LOB.getLength(l_blob) > 0 THEN /* upload the file to Amazon S3 */ ALEX.amazon_aws_s3_pkg.new_object ( 'mybucket' , rec.filename , l_blob , rec.mime_type , ALEX.amazon_aws_s3_pkg.g_acl_public_read); UPDATE myfiles SET location = 'https://mybucket.s3-ap-southeast-1.amazonaws.com/' WHERE id = rec.id; COMMIT; END IF; END LOOP; END;
After a short while, all the files had been copied across to my bucket on S3, and my table updated so that my web site now points people’s browsers to the new location for those files.
Of course, I could have used UTL_FILE to read the files from disk, but then I’d have to first create a directory, and write a loop to read the file in chunks into the BLOB. Why bother with all that when I can just call http_util_pkg.get_blog_from_url and get it all in one go?
That’s the trouble with powerful utilities like Alexandria: they’re too easy to use, make tasks like this trivial, and you start finding all sorts of uses for them. All of a sudden, Alexandria is your hammer, and the world is full of nails.
See also: this quick intro to using Alexandria’s API for Amazon S3.
Normally, when you see code like this in a production system, you should duck your head and run:
SELECT NVL( MAX( id ), 0 ) + 1 INTO :new_id FROM mytable;
What’s wrong with this code?
I hope the first answer that rolls off your tongue has something to do with concurrency – i.e. two sessions that run this around the same time will not see uncommitted rows from each other, and so are likely to try to insert rows with conflicting identifiers.
I hope the second answer that you might mention has to do with performance – even considering there’s a unique index on the column, this code will still need to read at least one index block to get the latest ID (assuming the query optimiser chooses to do a MIN/MAX index scan so that it doesn’t have to scan the entire index before returning a result). In a high load system this cost might be unacceptable.
Of course, the first problem (concurrency) could be solved by serializing access to the “get the next ID” function, e.g. with a DBMS_LOCK. We all know, however, that there’s no sane reason to serialize this when Oracle already provides a perfectly good mechanism for generating unique IDs, with virtually no serialization – sequences.
CREATE SEQUENCE my_id_seq; SELECT my_id_seq.NEXTVAL INTO :new_id FROM DUAL;
Sequences have the benefits of guaranteeing uniqueness, and if their “cache” setting is set appropriately, will add a negligible amount of overhead for serialization.
Problem solved. Easy, right? I bet you’re wondering why I added the word “Normally” to my first sentence in this post….
Question: When might using “SELECT MAX(id) + 1″ ever be an acceptable source of unique identifiers?
Answer: Global Temporary tables.
If I’ve inserted any rows into a global temporary table, by definition no other session can see my data, so the first consideration, concurrency, is not an issue.
Also, if I’m not expecting to ever insert many rows into my global temporary table, I can be reasonably confident that performance will not be an issue either. Plus, if I put an index on the ID column, that query will be quite inexpensive.
Conclusion: if you are using global temporary tables, you don’t have to use sequences to generate unique identifiers for them. I’m not saying you shouldn’t, of course – a sequence may be faster, and may even lead to simpler code in some cases – but in other cases you might decide to forego a sequence – one less object, with perhaps its role grants and synonyms, to deploy.
Now, of course, you have to ask yourself, why query the table at all? Why not store that latest ID in a private global variable in a package? In fact, we can create a simple package to replace the sequence, e.g.:
CREATE OR REPLACE PACKAGE my_table_pkg IS FUNCTION next_id RETURN my_table.id%TYPE; END my_table_pkg; CREATE OR REPLACE PACKAGE BODY my_table_pkg IS g_latest_id my_table.id%TYPE; FUNCTION next_id RETURN my_table.id%TYPE IS BEGIN g_latest_id := NVL(g_latest_id, 0) + 1; RETURN g_latest_id; END next_id; END my_table_pkg;
Well, now you know what to do. Whenever you need to generate a unique set of identifiers for a global temporary table, you’ve got a choice of options: sequence, package variable, or a “max(id)+1″ query.
I needed a table that could only ever have one row – if anyone tried to insert a second row they’d get an error.
CREATE UNIQUE INDEX only_one_row_allowed ON mytable (1);
INSERT INTO mytable VALUES ('x');
ORA-00001: unique constraint (SCOTT.ONLY_ONE_ROW_ALLOWED) violated