Alexandria: May Cause Addiction

Ever since I downloaded the Alexandria PL/SQL library, I haven’t been able to put it down. Just recently I decided I wanted to serve up a whole lot of media files directly from Amazon’s S3 simple storage service, instead of serving them from within my EC2 (elastic compute) instance. They were just wasting my linux server’s time responding to http requests.

So, I quickly wrote the following code to transfer them:

DECLARE
  l_blob BLOB;
BEGIN
  /* initialise my AWS session */
  ALEX.amazon_aws_auth_pkg.init
    ( 'yyy-my-aws-id-yyy'
    , 'xxx-not-telling-xxx'
    , p_gmt_offset => -8);
  FOR rec IN (
    SELECT id, filename, mime_type, location
    FROM myfiles
    WHERE location = 'http://myserver/media/'
  ) LOOP
    /* read the file from its current location */
    l_blob := ALEX.http_util_pkg.get_blob_from_url
      (rec.location || rec.filename);
    IF DBMS_LOB.getLength(l_blob) > 0 THEN
      /* upload the file to Amazon S3 */
      ALEX.amazon_aws_s3_pkg.new_object
        ( 'mybucket'
        , rec.filename
        , l_blob
        , rec.mime_type
        , ALEX.amazon_aws_s3_pkg.g_acl_public_read);
      UPDATE myfiles
      SET location = 'http://mybucket.s3-ap-southeast-1.amazonaws.com/'
      WHERE id = rec.id;
      COMMIT;
    END IF;
  END LOOP;
END;

After a short while, all the files had been copied across to my bucket on S3, and my table updated so that my web site now points people’s browsers to the new location for those files.

Of course, I could have used UTL_FILE to read the files from disk, but then I’d have to first create a directory, and write a loop to read the file in chunks into the BLOB. Why bother with all that when I can just call http_util_pkg.get_blog_from_url and get it all in one go?

That’s the trouble with powerful utilities like Alexandria: they’re too easy to use, make tasks like this trivial, and you start finding all sorts of uses for them. All of a sudden, Alexandria is your hammer, and the world is full of nails.

See also: this quick intro to using Alexandria’s API for Amazon S3.


Generating unique identifiers with “SELECT MAX(id) + 1”

Normally, when you see code like this in a production system, you should duck your head and run:

SELECT NVL( MAX( id ), 0 ) + 1
INTO   :new_id
FROM   mytable;

What’s wrong with this code?

I hope the first answer that rolls off your tongue has something to do with concurrency – i.e. two sessions that run this around the same time will not see uncommitted rows from each other, and so are likely to try to insert rows with conflicting identifiers.

I hope the second answer that you might mention has to do with performance – even considering there’s a unique index on the column, this code will still need to read at least one index block to get the latest ID (assuming the query optimiser chooses to do a MIN/MAX index scan so that it doesn’t have to scan the entire index before returning a result). In a high load system this cost might be unacceptable.

Of course, the first problem (concurrency) could be solved by serializing access to the “get the next ID” function, e.g. with a DBMS_LOCK. We all know, however, that there’s no sane reason to serialize this when Oracle already provides a perfectly good mechanism for generating unique IDs, with virtually no serialization – sequences.

CREATE SEQUENCE my_id_seq;
SELECT my_id_seq.NEXTVAL INTO :new_id FROM DUAL;

Sequences have the benefits of guaranteeing uniqueness, and if their cache setting is set appropriately, will add a negligible amount of overhead for serialization.

Problem solved. Easy, right? I bet you’re wondering why I added the word “Normally” to my first sentence in this post….

Question: When might using “SELECT MAX(id) + 1” ever be an acceptable source of unique identifiers?

Answer: Global Temporary tables.

If I’ve inserted any rows into a global temporary table, by definition no other session can see my data, so the first consideration, concurrency, is not an issue.

Also, if I’m not expecting to ever insert many rows into my global temporary table, I can be reasonably confident that performance will not be an issue either. Plus, if I put an index on the ID column, that query will be quite inexpensive.

Conclusion: if you are using global temporary tables, you don’t have to use sequences to generate unique identifiers for them. I’m not saying you shouldn’t, of course – a sequence may be faster, and may even lead to simpler code in some cases – but in other cases you might decide to forego a sequence – one less object, with perhaps its role grants and synonyms, to deploy.

Now, of course, you have to ask yourself, why query the table at all? Why not store that latest ID in a private global variable in a package? In fact, we can create a simple package to replace the sequence, e.g.:

CREATE OR REPLACE PACKAGE my_table_pkg IS
FUNCTION next_id RETURN my_table.id%TYPE;
END my_table_pkg;
CREATE OR REPLACE PACKAGE BODY my_table_pkg IS
  g_latest_id my_table.id%TYPE;
FUNCTION next_id RETURN my_table.id%TYPE IS
  BEGIN
    g_latest_id := NVL(g_latest_id, 0) + 1;
    RETURN g_latest_id;
  END next_id;
END my_table_pkg;

Well, now you know what to do. Whenever you need to generate a unique set of identifiers for a global temporary table, you’ve got a choice of options: sequence, package variable, or a “max(id)+1” query.


Designing a PL/SQL API – BOOLEAN or CHAR?

A simple question: you’re designing an API to be implemented as a PL/SQL package, and you don’t (yet) know the full extent to which your API may be used, so you want to cover a reasonable variety of possible usage cases.

You have a function that will return a BOOLEAN – i.e. TRUE or FALSE (or perhaps NULL). Should you implement it this way, or should you return some other kind of value – e.g. a CHAR – e.g. ‘Y’ for TRUE or ‘N’ for FALSE; or how about a NUMBER – e.g. 1 for TRUE or 0 for FALSE?

This debate has raged since 2002, and probably earlier – e.g. http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6263249199595

Well, if I use a BOOLEAN, it makes the code simple and easy to understand – and callers can call my function in IF and WHILE statements without having to compare the return value to anything. However, I can’t call the function from a SQL statement, which can be annoyingly restrictive.

If I use a CHAR or NUMBER, I can now call the function from SQL, and store it in a table – but it makes the code just a little more complicated – now, the caller has to trust that I will ONLY return the values agreed on. Also, there is no way to formally restrict the values as agreed – I’d have to just document them in the package comments. I can help by adding some suitable constants in the package spec, but note that Oracle Forms cannot refer to these constants directly. Mind you, if the value is being stored in a table, a suitable CHECK constraint would be a good idea.

Perhaps a combination? Have a function that returns BOOLEAN, and add wrapper functions that converts a BOOLEAN into a ‘Y’ or ‘N’ as appropriate? That might be suitable.

Personally, though, I hate the NUMBER (1 or 0) idea for PL/SQL. That’s so C-from-the-1970’s. Who codes like that anymore?


Write REF CURSOR to file

This is just a very simple procedure that makes writing a lot of files using UTL_FILE much simpler. It can be used to easily create simple CSVs or fixed-length files.

I’ve shown it here in a package body; it’s up to you to create a package spec.

To use it, all you need to do is create a query that concatenates all the data into a single string up to 4000 characters long. If you’re writing a CSV you need to concatenate commas and quotes as appropriate.

CREATE OR REPLACE
PACKAGE BODY packagename AS
-- If no records are found in the cursor, no file is created.
PROCEDURE write_cursor_to_file
  (outputdir  IN VARCHAR2
  ,filename   IN VARCHAR2
  ,headerline IN VARCHAR2
  ,refcursor  IN SYS_REFCURSOR
  ) IS
  -- increase to make faster but use more memory;
  -- decrease to use less memory but run slower
  BATCHSIZE CONSTANT INTEGER := 100;
  TYPE varr_type IS TABLE OF VARCHAR2(4000) INDEX BY BINARY_INTEGER;
  varr varr_type;
  outf UTL_FILE.FILE_TYPE;
BEGIN
  -- note: don't open the file unless we actually get some
  -- records back from the cursor
  LOOP
    FETCH write_cursor_to_file.refcursor
    BULK COLLECT INTO varr
    LIMIT BATCHSIZE;
    EXIT WHEN varr.COUNT = 0;
    -- We have some records to write. Have we opened the file yet?
    IF NOT UTL_FILE.IS_OPEN (outf) THEN
      outf := UTL_FILE.fopen
        (file_location => write_cursor_to_file.outputdir
        ,file_name     => write_cursor_to_file.filename
        ,open_mode     => 'A'
        ,max_linesize  => 4000);
      UTL_FILE.put_line (outf, write_cursor_to_file.headerline);
    END IF;
    -- Write the batch of records to the file
    FOR i IN 1..varr.COUNT LOOP
      UTL_FILE.put_line (outf, varr(i));
    END LOOP;
  END LOOP;
  CLOSE write_cursor_to_file.refcursor;
  IF UTL_FILE.IS_OPEN (outf) THEN
    UTL_FILE.fclose (outf);
  END IF;
END write_cursor_to_file;
PROCEDURE sample IS
  OUTPUTDIR  CONSTANT VARCHAR2(100) := 'MY_DIR_NAME';
  FILENAME   CONSTANT VARCHAR2(100) := 'my_file_name.csv';
  HEADERLINE CONSTANT VARCHAR2(4000) := 'Name,Address,Date of Birth';
  refcursor SYS_REFCURSOR;
BEGIN
  OPEN refcursor FOR
    select '"' || name || '"'
        || ',"' || address || '"'
        || ',' || TO_CHAR(dob,'DD-Mon-YYYY')
    from persons;
  write_cursor_to_file
    (outputdir  => OUTPUTDIR
    ,filename   => FILENAME
    ,headerline => HEADERLINE
    ,refcursor  => refcursor
    );
END sample;
END packagename;
/

It appends to the file if it finds it. This makes it easy to write the result of several queries to the same file.

Note: if you’re on Oracle 8i or earlier, you’ll need to add a replacement for SYS_REFCURSOR, e.g. TYPE my_sys_refcursor IS REF CURSOR; either at the top of the package, or if you want to make the write_cursor_to_file procedure public, put the type definition in your package spec.


Add business days

It starts out as a fairly simple, innocent business requirement. Create a report to list records meeting some criteria, one of which is:

“List only records where today’s date is more than 35 business days after the due date from the record.”

When you delve deeper you find that querying the table with “DUE_DATE + 35 < SYSDATE” is not going to cut it – “business days” do not include weekends. You might start with something similar to this. But even that’s not good enough, because business days should not include public holidays. How do you code that?

So, here’s my solution.

1. We need to know what days are public holidays for the region. In our case this application is only applicable for a single region, so we use a simple table:

CREATE TABLE holidays (holiday_date DATE PRIMARY KEY);

We create a simple form for users to enter new holidays every year, and give someone the job of making sure it’s up-to-date every year when the public holidays are announced.

2. Create a view that lists all non-business days – i.e. list all weekends and public holidays. To retain reasonable performance, we limit our solution to dates in the years 2000 to 2050.

CREATE VIEW non_business_days AS
SELECT TO_DATE('01012000','DDMMYYYY') + ROWNUM * 7
       AS day -- Saturdays 2000 to 2050
FROM DUAL CONNECT BY LEVEL <= 2661
UNION ALL
SELECT to_date('02012000','DDMMYYYY') + ROWNUM * 7
       AS day -- Sundays 2000 to 2050
FROM DUAL CONNECT BY LEVEL <= 2661
UNION ALL
SELECT holiday_date FROM holidays;

3. Now, when we need to take a date and add x business days to it, we query this table to find all the non-business-days that are applicable, e.g.:

SELECT day
      ,COUNT(*) OVER (ORDER BY day
                      ROWS BETWEEN UNBOUNDED PRECEDING
                      AND CURRENT ROW)
       AS count_so_far
      ,(day - p_date) AS base_days
FROM   NON_BUSINESS_DAYS
WHERE  day > p_date;

If you run this query and examine each row in order of day, if you take base_days and subtract count_so_far, when the result is less than x, then base_days – count_so_far is the number of extra days we need to add to the holiday’s date to give us the answer. You’ll find this logic in the function below.

In our final solution, we’ll also need to UNION in the date parameter as well, for the case where there are no holidays between the starting date and the number of business days requested.

Here’s our function to take any date (at least, any date between 2000 and 2050) and add x business days (positive or negative):

FUNCTION add_working_days (p_date IN DATE, p_working_days IN NUMBER)
RETURN DATE IS
  l_date DATE;
BEGIN

  IF p_date IS NULL OR p_working_days IS NULL THEN
    RETURN NULL;
  END IF;

  IF p_working_days != TRUNC(p_working_days) THEN
    RAISE_APPLICATION_ERROR(-20000,
      'add_working_days: cannot handle fractional p_working_days ('
      || p_working_days || ')');
  END IF;

  IF p_working_days > 0 THEN

    SELECT MAX(day + p_working_days - (base_days - count_so_far))
    INTO l_date
    FROM (SELECT day
                ,COUNT(*) OVER (ORDER BY day
                                ROWS BETWEEN UNBOUNDED PRECEDING
                                AND CURRENT ROW)
                 AS count_so_far
                ,(day - p_date) AS base_days
          FROM NON_BUSINESS_DAYS
          WHERE day > p_date
          UNION
          SELECT p_date, 0, 0 FROM DUAL
         )
    WHERE base_days - count_so_far < p_working_days;

  ELSIF p_working_days < 0 THEN

    SELECT MIN(day - (ABS(p_working_days) - (base_days - count_so_far)))
    INTO l_date
    FROM (SELECT day
                ,COUNT(*) OVER (ORDER BY day DESC
                                ROWS BETWEEN UNBOUNDED PRECEDING
                                AND CURRENT ROW)
                 AS count_so_far
                ,(p_date - day) AS base_days
          FROM NON_BUSINESS_DAYS
          WHERE day < p_date
          UNION
          SELECT p_date, 0, 0 FROM DUAL
         )
    WHERE base_days - count_so_far < ABS(p_working_days);

  ELSE

    l_date := p_date;

  END IF;

  RETURN l_date;
END add_working_days;

Test cases (these are some public holidays in Western Australia):

insert into holidays values (to_date('27/12/2010','DD/MM/YYYY');
insert into holidays values (to_date('28/12/2010','DD/MM/YYYY');
insert into holidays values (to_date('03/01/2011','DD/MM/YYYY');
insert into holidays values (to_date('26/01/2011','DD/MM/YYYY');

— Expected: 06/01/2011

select cls_util.add_working_days(to_date('13/12/2010','DD/MM/YYYY')
                                ,15) from dual;

— Expected: 31/01/2011

select cls_util.add_working_days(to_date('25/01/2011','DD/MM/YYYY')
                                ,3) from dual;

— Expected: 13/12/2010

select cls_util.add_working_days(to_date('06/01/2011','DD/MM/YYYY')
                                ,-15) from dual;

— Expected: 25/01/2011

select cls_util.add_working_days(to_date('31/01/2011','DD/MM/YYYY')
                                ,-3) from dual;

My First Quiz

I was pleased to see my PL/SQL quiz question presented yesterday. It was about how PL/SQL variable names and other identifiers whose names conflict with names of tables and columns can still be referenced in SQL within PL/SQL.

It demonstrates how a local variable or parameter of a procedure or function may be referred to unambiguously, by referring to it by the name of the procedure or function, e.g.:

FUNCTION getempid (employee_id IN plch_employees.employee_id%TYPE)
   RETURN plch_employees.employee_id%TYPE
IS
   the_id   plch_employees.employee_id%TYPE;
BEGIN
   SELECT plch_employees.employee_id
     INTO getempid.the_id
     FROM plch_employees
    WHERE plch_employees.employee_id = getempid.employee_id;
   RETURN getempid.the_id;
END getempid;

(notice that the INTO and RETURN parts don’t actually need to reference the function name to refer to the local variable, but I’ve done it this way for clarity)

In the case of an anonymous block, it is necessary to supply a label for the block:

<<getempid>>
DECLARE
   employee_id   plch_employees.employee_id%TYPE := 100;
BEGIN
   SELECT plch_employees.employee_id
     INTO getempid.employee_id
     FROM plch_employees
    WHERE plch_employees.employee_id = getempid.employee_id;

   DBMS_OUTPUT.put_line (getempid.employee_id);
END getempid;

The quiz answers also explore what happens if a variable name conflicts with a column name, and the identifiers are not referenced; another answer considers the case where the block label happens to conflict with a table name.

I heartily recommend thinking about and writing your own quiz questions and submitting them for Steven to consider for the Challenge. If you do, here are some tips:

  • Focus on one topic – remove any irrelevant details
  • Build a full test case and run it (and re-run it carefully after every single change you make!)
  • Try to remove as much code as possible (without ruining it)
  • Re-read the question the next day, and imagine being a typical Challenger, who nitpicks every single statement 🙂

INSERT-VALUES vs. INSERT-SELECT-FROM-DUAL

There’s no difference between the effects of the following two statements, are there:

INSERT INTO mytable (col1, col2) VALUES ('hello','world');
INSERT INTO mytable (col1, col2) SELECT 'hello', 'world' FROM DUAL;

Well, as it turns out, it is possible for the first statement to succeed where the second statement would fail – in the presence of a suitably crafted Before Insert trigger, the second will raise “ORA-04091 table is mutating, trigger/function may not see it”:

http://oraclequirks.blogspot.com/2010/09/ora-04091-table-stringstring-is.html


To Exist or Not To Exist

An interesting discussion on the PL/SQL Challenge blog here has led to me changing my mind about “the best way” to loop through a sparse PL/SQL associative array.

Normally, if we know that an array has been filled, with no gaps in indices, we would use a simple FOR LOOP:

DECLARE
  TYPE t IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
  a t;
BEGIN
  SELECT x BULK COLLECT INTO a FROM mytable;
  FOR i IN a.FIRST..a.LAST LOOP
    -- process a(i)
  END LOOP;
END;

If, however, the array may be sparsely filled (i.e. there might be one or more gaps in the sequence), this was “the correct way” to loop through it:

Method A (First/Next)

DECLARE
  TYPE t IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
  a t;
  i BINARY_INTEGER;
BEGIN
  ...
  i := a.FIRST;
  LOOP
    EXIT WHEN i IS NULL;
    -- process a(i)
    i := a.NEXT(i);
  END LOOP;
END;

Method A takes advantage of the fact that an associative array in Oracle is implemented internally as a linked list – the fastest way to “skip over” any gaps is to call the NEXT operator on the list for a given index.

Alternatively, one could still just loop through all the indices from the first to the last index; but the problem with this approach is that if an index is not found in the array, it will raise the NO_DATA_FOUND exception. Well, Method B simply catches the exception:

Method B (Handle NDF)

DECLARE
  TYPE t IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
  a t;
BEGIN
  ...
  FOR i IN a.FIRST..a.LAST LOOP
    BEGIN
      -- process a(i)
    EXCEPTION
      WHEN NO_DATA_FOUND THEN
        NULL;
    END;
  END LOOP;
END;

This code effectively works the same (with one important proviso*) as Method A. The difference, however, is in terms of relative performance. This method is much faster than Method A, if the array is relatively dense. If the array is relatively sparse, Method A is faster.

* It must be remembered that the NO_DATA_FOUND exception may be raised by a number of different statements in a program: if you use code like this, you must make sure that the exception was only raised by the attempt to access a(i), and not by some other code!

A third option is to loop through as in Method B, but call the EXISTS method on the array to check if the index is found, instead of relying on the NO_DATA_FOUND exception.

Method C (EXISTS)

DECLARE
  TYPE t IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
  a t;
BEGIN
  ...
  FOR i IN a.FIRST..a.LAST LOOP
    IF a.EXISTS(i) THEN
      -- process a(i)
    END IF;
  END LOOP;
END;

The problem with this approach is that it effectively checks the existence of i in the array twice: once for the EXISTS check, and if found, again when actually referencing a(i). For a large array which is densely populated, depending on what processing is being done inside the loop, this could have a measurable impact on performance.

Bottom line: there is no “one right way” to loop through a sparse associative array. But there are some rules-of-thumb about performance we can take away:

  1. When the array is likely often very sparsely populated with a large index range, use Method A (First/Next).
  2. When the array is likely often very densely populated with a large number of elements, use Method B (Handle NDF). But watch how you catch the NO_DATA_FOUND exception!
  3. If you’re not sure, I’d tend towards Method A (First/Next) until performance problems are actually evident.

You probably noticed that I haven’t backed up any of these claims about performance with actual tests or results. You will find some in the comments to the afore-mentioned PL/SQL Challenge blog post; but I encourage you to log into a sandpit Oracle environment and test it yourself.


Directory File List in PL/SQL (Oracle 10g)

UTL_FILE, unfortunately, does not allow you to query the contents of a directory.

If you’re on Oracle 11g you can use the method described here: listing files with the external table preprocessor in 11g.

If you’re on Oracle 10g another option is to create a java routine. Note: this is not my code – a colleague sent the basics to me and I just modified it somewhat to suit my purpose at the time.

CREATE OR REPLACE AND RESOLVE
JAVA SOURCE NAMED "DirectoryLister" AS
import java.io.File;
import java.util.Arrays;
public class DirectoryLister
{
  public static String getFileList(String idir, String sep)
  {
    File aDirectory = new File(idir);
    File[] filesInDir = aDirectory.listFiles();
    String result = "";
    for ( int i=0; i<filesInDir.length; i++ )
    {
        if ( filesInDir[i].isFile()
             && !filesInDir[i].isHidden() )
        {
            result = result + sep + filesInDir[i].getName();
        }
    }
    return result;
  }
};
/

CREATE OR REPLACE
FUNCTION dirlist_csv
(p_dir IN VARCHAR2, p_sep IN VARCHAR2) RETURN VARCHAR2
AS LANGUAGE JAVA NAME
'DirectoryLister.getFileList
  (java.lang.String, java.lang.String)
  return String';
/

BEGIN
  DBMS_JAVA.grant_permission('USER',
    'java.io.FilePermission', '<>', 'read');
  DBMS_JAVA.grant_permission('USER',
    'SYS:java.lang.RuntimePermission',
    'writeFileDescriptor', '');
  DBMS_JAVA.grant_permission('USER',
    'SYS:java.lang.RuntimePermission',
    'readFileDescriptor', '');
END;
/

(replace “USER” in the above commands with the user name)

CREATE OR REPLACE
FUNCTION get_path (dir IN VARCHAR2) RETURN VARCHAR2 IS
  o_path ALL_DIRECTORIES.directory_path%TYPE;
BEGIN
  SELECT directory_path INTO o_path
  FROM ALL_DIRECTORIES d
  WHERE  d.directory_name = dir;
  RETURN o_path;
END get_path;

Sample script:

DECLARE
  csv VARCHAR2(32767);
  filename VARCHAR2(1000);
BEGIN
  csv := dirlist_csv(get_path('MYDIR'), ',');
  LOOP
    EXIT WHEN csv IS NULL;
    filename := SUBSTR(csv, INSTR(csv, ',', -1) + 1);
    dbms_output.put_line(filename);
    csv := SUBSTR(csv, 1, INSTR(csv, ',', -1) - 1);
  END LOOP;
END;
/

Note: If the number of files is expected to be large, it would be better to get the java program to insert the file names into a global temporary table, then query that, instead of returning the list as a CSV string.


In Oracle, ” = NULL but NULL != ”

When I get the result of my PL/SQL quiz for the day, I’m pleased when I got it right, but if I got it wrong, I’m either annoyed or overjoyed:

  1. If I disagreed with the result, I’m annoyed.
  2. If I agreed with the result, I’m overjoyed – because I learned something new, or I was reminded of something I should have remembered.

Option #2 was my experience this morning – yesterday’s quiz featured the following code snippet:

...
   EXECUTE IMMEDIATE 'update my_table set my_column = :value'
   USING NULL;
...

This was one of four other, very similar, options – and I failed to notice that this version was binding NULL directly into the statement, instead of using a variable as any normal, reasonable, rational human being would. This snippet raises PLS-00457: expressions have to be of SQL types, which in this case is due to the fact that NULL is of no particular SQL type.

If one wanted to bind a literal NULL into a statement such as the one above, you don’t necessarily need a variable:

...
   EXECUTE IMMEDIATE 'update my_table set my_column = :value'
   USING '';
...

Proving that while is NULL, NULL is not – they are not always interchangeable.

P.S. please ignore the post title – I know it is incorrect to write ” = NULL or NULL != ” – but it wasn’t meant to be code, ok?