Ever since I downloaded the Alexandria PL/SQL library, I haven’t been able to put it down. Just recently I decided I wanted to serve up a whole lot of media files directly from Amazon’s S3 simple storage service, instead of serving them from within my EC2 (elastic compute) instance. They were just wasting my linux server’s time responding to http requests.
So, I quickly wrote the following code to transfer them:
/* initialise my AWS session */
, p_gmt_offset => -8);
FOR rec IN (
SELECT id, filename, mime_type, location
WHERE location = 'http://myserver/media/'
/* read the file from its current location */
l_blob := ALEX.http_util_pkg.get_blob_from_url
(rec.location || rec.filename);
IF DBMS_LOB.getLength(l_blob) > 0 THEN
/* upload the file to Amazon S3 */
SET location = 'http://mybucket.s3-ap-southeast-1.amazonaws.com/'
WHERE id = rec.id;
After a short while, all the files had been copied across to my bucket on S3, and my table updated so that my web site now points people’s browsers to the new location for those files.
Of course, I could have used UTL_FILE to read the files from disk, but then I’d have to first create a directory, and write a loop to read the file in chunks into the BLOB. Why bother with all that when I can just call http_util_pkg.get_blog_from_url and get it all in one go?
That’s the trouble with powerful utilities like Alexandria: they’re too easy to use, make tasks like this trivial, and you start finding all sorts of uses for them. All of a sudden, Alexandria is your hammer, and the world is full of nails.
See also: this quick intro to using Alexandria’s API for Amazon S3.
Normally, when you see code like this in a production system, you should duck your head and run:
SELECT NVL( MAX( id ), 0 ) + 1
What’s wrong with this code?
I hope the first answer that rolls off your tongue has something to do with concurrency – i.e. two sessions that run this around the same time will not see uncommitted rows from each other, and so are likely to try to insert rows with conflicting identifiers.
I hope the second answer that you might mention has to do with performance – even considering there’s a unique index on the column, this code will still need to read at least one index block to get the latest ID (assuming the query optimiser chooses to do a MIN/MAX index scan so that it doesn’t have to scan the entire index before returning a result). In a high load system this cost might be unacceptable.
Of course, the first problem (concurrency) could be solved by serializing access to the “get the next ID” function, e.g. with a DBMS_LOCK. We all know, however, that there’s no sane reason to serialize this when Oracle already provides a perfectly good mechanism for generating unique IDs, with virtually no serialization – sequences.
CREATE SEQUENCE my_id_seq;
SELECT my_id_seq.NEXTVAL INTO :new_id FROM DUAL;
Sequences have the benefits of guaranteeing uniqueness, and if their “cache” setting is set appropriately, will add a negligible amount of overhead for serialization.
Problem solved. Easy, right? I bet you’re wondering why I added the word “Normally” to my first sentence in this post….
Question: When might using “SELECT MAX(id) + 1” ever be an acceptable source of unique identifiers?
Answer: Global Temporary tables.
If I’ve inserted any rows into a global temporary table, by definition no other session can see my data, so the first consideration, concurrency, is not an issue.
Also, if I’m not expecting to ever insert many rows into my global temporary table, I can be reasonably confident that performance will not be an issue either. Plus, if I put an index on the ID column, that query will be quite inexpensive.
Conclusion: if you are using global temporary tables, you don’t have to use sequences to generate unique identifiers for them. I’m not saying you shouldn’t, of course – a sequence may be faster, and may even lead to simpler code in some cases – but in other cases you might decide to forego a sequence – one less object, with perhaps its role grants and synonyms, to deploy.
Now, of course, you have to ask yourself, why query the table at all? Why not store that latest ID in a private global variable in a package? In fact, we can create a simple package to replace the sequence, e.g.:
CREATE OR REPLACE PACKAGE my_table_pkg IS
FUNCTION next_id RETURN my_table.id%TYPE;
CREATE OR REPLACE PACKAGE BODY my_table_pkg IS
FUNCTION next_id RETURN my_table.id%TYPE IS
g_latest_id := NVL(g_latest_id, 0) + 1;
Well, now you know what to do. Whenever you need to generate a unique set of identifiers for a global temporary table, you’ve got a choice of options: sequence, package variable, or a “max(id)+1” query.
I needed a table that could only ever have one row – if anyone tried to insert a second row they’d get an error.
CREATE UNIQUE INDEX only_one_row_allowed ON mytable (1);
INSERT INTO mytable VALUES ('x');
ORA-00001: unique constraint (SCOTT.ONLY_ONE_ROW_ALLOWED) violated