INSERT-VALUES vs. INSERT-SELECT-FROM-DUAL

There’s no difference between the effects of the following two statements, are there:

INSERT INTO mytable (col1, col2) VALUES ('hello','world');
INSERT INTO mytable (col1, col2) SELECT 'hello', 'world' FROM DUAL;

Well, as it turns out, it is possible for the first statement to succeed where the second statement would fail – in the presence of a suitably crafted Before Insert trigger, the second will raise “ORA-04091 table is mutating, trigger/function may not see it”:

http://oraclequirks.blogspot.com/2010/09/ora-04091-table-stringstring-is.html


To Exist or Not To Exist

An interesting discussion on the PL/SQL Challenge blog here has led to me changing my mind about “the best way” to loop through a sparse PL/SQL associative array.

Normally, if we know that an array has been filled, with no gaps in indices, we would use a simple FOR LOOP:

DECLARE
  TYPE t IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
  a t;
BEGIN
  SELECT x BULK COLLECT INTO a FROM mytable;
  FOR i IN a.FIRST..a.LAST LOOP
    -- process a(i)
  END LOOP;
END;

If, however, the array may be sparsely filled (i.e. there might be one or more gaps in the sequence), this was “the correct way” to loop through it:

Method A (First/Next)

DECLARE
  TYPE t IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
  a t;
  i BINARY_INTEGER;
BEGIN
  ...
  i := a.FIRST;
  LOOP
    EXIT WHEN i IS NULL;
    -- process a(i)
    i := a.NEXT(i);
  END LOOP;
END;

Method A takes advantage of the fact that an associative array in Oracle is implemented internally as a linked list – the fastest way to “skip over” any gaps is to call the NEXT operator on the list for a given index.

Alternatively, one could still just loop through all the indices from the first to the last index; but the problem with this approach is that if an index is not found in the array, it will raise the NO_DATA_FOUND exception. Well, Method B simply catches the exception:

Method B (Handle NDF)

DECLARE
  TYPE t IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
  a t;
BEGIN
  ...
  FOR i IN a.FIRST..a.LAST LOOP
    BEGIN
      -- process a(i)
    EXCEPTION
      WHEN NO_DATA_FOUND THEN
        NULL;
    END;
  END LOOP;
END;

This code effectively works the same (with one important proviso*) as Method A. The difference, however, is in terms of relative performance. This method is much faster than Method A, if the array is relatively dense. If the array is relatively sparse, Method A is faster.

* It must be remembered that the NO_DATA_FOUND exception may be raised by a number of different statements in a program: if you use code like this, you must make sure that the exception was only raised by the attempt to access a(i), and not by some other code!

A third option is to loop through as in Method B, but call the EXISTS method on the array to check if the index is found, instead of relying on the NO_DATA_FOUND exception.

Method C (EXISTS)

DECLARE
  TYPE t IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
  a t;
BEGIN
  ...
  FOR i IN a.FIRST..a.LAST LOOP
    IF a.EXISTS(i) THEN
      -- process a(i)
    END IF;
  END LOOP;
END;

The problem with this approach is that it effectively checks the existence of i in the array twice: once for the EXISTS check, and if found, again when actually referencing a(i). For a large array which is densely populated, depending on what processing is being done inside the loop, this could have a measurable impact on performance.

Bottom line: there is no “one right way” to loop through a sparse associative array. But there are some rules-of-thumb about performance we can take away:

  1. When the array is likely often very sparsely populated with a large index range, use Method A (First/Next).
  2. When the array is likely often very densely populated with a large number of elements, use Method B (Handle NDF). But watch how you catch the NO_DATA_FOUND exception!
  3. If you’re not sure, I’d tend towards Method A (First/Next) until performance problems are actually evident.

You probably noticed that I haven’t backed up any of these claims about performance with actual tests or results. You will find some in the comments to the afore-mentioned PL/SQL Challenge blog post; but I encourage you to log into a sandpit Oracle environment and test it yourself.


Directory File List in PL/SQL (Oracle 10g)

UTL_FILE, unfortunately, does not allow you to query the contents of a directory.

If you’re on Oracle 11g you can use the method described here: listing files with the external table preprocessor in 11g.

If you’re on Oracle 10g another option is to create a java routine. Note: this is not my code – a colleague sent the basics to me and I just modified it somewhat to suit my purpose at the time.

CREATE OR REPLACE AND RESOLVE
JAVA SOURCE NAMED "DirectoryLister" AS
import java.io.File;
import java.util.Arrays;
public class DirectoryLister
{
  public static String getFileList(String idir, String sep)
  {
    File aDirectory = new File(idir);
    File[] filesInDir = aDirectory.listFiles();
    String result = "";
    for ( int i=0; i<filesInDir.length; i++ )
    {
        if ( filesInDir[i].isFile()
             && !filesInDir[i].isHidden() )
        {
            result = result + sep + filesInDir[i].getName();
        }
    }
    return result;
  }
};
/

CREATE OR REPLACE
FUNCTION dirlist_csv
(p_dir IN VARCHAR2, p_sep IN VARCHAR2) RETURN VARCHAR2
AS LANGUAGE JAVA NAME
'DirectoryLister.getFileList
  (java.lang.String, java.lang.String)
  return String';
/

BEGIN
  DBMS_JAVA.grant_permission('USER',
    'java.io.FilePermission', '<>', 'read');
  DBMS_JAVA.grant_permission('USER',
    'SYS:java.lang.RuntimePermission',
    'writeFileDescriptor', '');
  DBMS_JAVA.grant_permission('USER',
    'SYS:java.lang.RuntimePermission',
    'readFileDescriptor', '');
END;
/

(replace “USER” in the above commands with the user name)

CREATE OR REPLACE
FUNCTION get_path (dir IN VARCHAR2) RETURN VARCHAR2 IS
  o_path ALL_DIRECTORIES.directory_path%TYPE;
BEGIN
  SELECT directory_path INTO o_path
  FROM ALL_DIRECTORIES d
  WHERE  d.directory_name = dir;
  RETURN o_path;
END get_path;

Sample script:

DECLARE
  csv VARCHAR2(32767);
  filename VARCHAR2(1000);
BEGIN
  csv := dirlist_csv(get_path('MYDIR'), ',');
  LOOP
    EXIT WHEN csv IS NULL;
    filename := SUBSTR(csv, INSTR(csv, ',', -1) + 1);
    dbms_output.put_line(filename);
    csv := SUBSTR(csv, 1, INSTR(csv, ',', -1) - 1);
  END LOOP;
END;
/

Note: If the number of files is expected to be large, it would be better to get the java program to insert the file names into a global temporary table, then query that, instead of returning the list as a CSV string.


In Oracle, ” = NULL but NULL != ”

When I get the result of my PL/SQL quiz for the day, I’m pleased when I got it right, but if I got it wrong, I’m either annoyed or overjoyed:

  1. If I disagreed with the result, I’m annoyed.
  2. If I agreed with the result, I’m overjoyed – because I learned something new, or I was reminded of something I should have remembered.

Option #2 was my experience this morning – yesterday’s quiz featured the following code snippet:

...
   EXECUTE IMMEDIATE 'update my_table set my_column = :value'
   USING NULL;
...

This was one of four other, very similar, options – and I failed to notice that this version was binding NULL directly into the statement, instead of using a variable as any normal, reasonable, rational human being would. This snippet raises PLS-00457: expressions have to be of SQL types, which in this case is due to the fact that NULL is of no particular SQL type.

If one wanted to bind a literal NULL into a statement such as the one above, you don’t necessarily need a variable:

...
   EXECUTE IMMEDIATE 'update my_table set my_column = :value'
   USING '';
...

Proving that while is NULL, NULL is not – they are not always interchangeable.

P.S. please ignore the post title – I know it is incorrect to write ” = NULL or NULL != ” – but it wasn’t meant to be code, ok?


The Templating Way

Today, grasshopper, you will learn the Way of the Template. The Templating Way is the path by which complex output is produced in a harmonious fashion.

The Templating Way does not cobble a string together from bits and pieces in linear fashion.

htp.p('<HTML><HEAD><TITLE>'||:title
||'</TITLE></HEAD><BODY>'
||:body||'</BODY></HTML>');

The Templating Way separates the Template from the Substitutions; by this division is harmony achieved.

DECLARE
  template VARCHAR2(200)
  := q'[
       <HTML>
        <HEAD>
         <TITLE> #TITLE# </TITLE>
        </HEAD>
        <BODY> #BODY# </BODY>
       </HTML>
      ]';
BEGIN
  htp.p(
    REPLACE( REPLACE( template
    ,'#TITLE#', :title)
    ,'#BODY#',  :body)
    );
END;

It is efficient – each substitution expression is evaluated once and once only, even if required many times within the template.

The Templating Way makes dynamic SQL easy to write and debug. It makes bugs shallower.

SELECT REPLACE(REPLACE(REPLACE(q'[
  CREATE OR REPLACE TRIGGER #OWNER#.#TABLE#_BI
  BEFORE INSERT ON #OWNER#.#TABLE#
  FOR EACH ROW
  BEGIN
    IF :NEW.#COLUMN# IS NULL THEN
      SELECT #TABLE#_SEQ.NEXTVAL
      INTO :NEW.#COLUMN#
      FROM DUAL;
    END IF;
  END;
]', '#OWNER#', USER)
  , '#TABLE#', cc.table_name)
  , '#COLUMN#', cc.column_name) AS ddl
FROM user_constraints c, user_cons_columns cc
WHERE c.constraint_type = 'P'
AND c.constraint_name = cc.constraint_name
AND cc.column_name like '%NO';

The Templating Way is simple, but looks complex to the uninitiated. It is readable, and affords maintainability.


Quick Draw Fail

It rarely pays to answer too quickly.

Yesterday’s question at the PL/SQL Challenge was about aliases for expressions in cursor-based record types.

DECLARE
   CURSOR profitable_internet_stocks
   IS
      SELECT SUM (share_price) FROM nasdaq_listings
       WHERE profits > 0 AND sector = 'INTERNET';
   few_and_far_in_between   profitable_internet_stocks%ROWTYPE;
BEGIN
   OPEN profitable_internet_stocks;
   FETCH profitable_internet_stocks
   INTO few_and_far_in_between;
   CLOSE profitable_internet_stocks;
   /*[display_statement]*/
END;

The question asked what code, inserted where the “display_statement” comment is, would display the result of the query. Without thinking too hard about it, I ticked the fourth option as correct:

DBMS_OUTPUT.PUT_LINE (few_and_far_in_between.sum_share_price);


I had this stupid idea that Oracle would step in and magically create an alias for the column*. A millisecond after clicking “Accept”, I realised I’d got it wrong. I’d assumed that at least one of the answers were likely to be correct, which is obviously not true.

* EDIT: Gary Myers has blogged about this too and explains that Oracle does, indeed, create an alias.


TAPI vs. XAPI


I just wanted to bring attention to some very interesting discussion (that’s been going on for years now) regarding Table APIs (TAPI) versus Transactional APIs (XAPI). Some very nice answers, as well as a bit of controversy 🙂


Single-Point-of-Definition by Example

Steven Feuerstein lists seven excellent “Golden Rules” in his presentation (via Eddie Awad) and says “Don’t repeat anything. Aim for a Single Point of Definition for every aspect of your application – formulas, business rules, magic values, SQL statements.” giving the following code as exhibit A:

(source)

I’m guessing in his presentation he spoke about various things that could be done to improve this code, but they’re not in the PDF; so I’d like to give it a go myself and see how much we can improve the maintainability of this code by reducing hard-coding.

1. Type Declarations

Instead of declaring parameters and variables as NUMBER, VARCHAR2 etc, these should use the %TYPE operator so that they are automatically synchronized with the datatype from the table columns they represent:

PROCEDURE process_employee
(department_id_in IN employees.department_id%TYPE)
IS
   l_id     employees.employee_id%TYPE;
   l_salary employees.salary%TYPE;
...

l_name, however, is not based on any table column we know of at this point; so there is no %TYPE we can use for it. But bear with me, we’ll fix this later.

2. Magic Values

This one’s a no-brainer: that “10000000” is obviously a magic value that some bean-counter decided was the correct threshold for the CEO’s salary. Whatever.

You might define this as a constant defined in a global package specification, e.g.

CREATE PACKAGE employee_constant AS
ceo_salary_threshold CONSTANT employees.salary%TYPE := 10000000;
END employee_constant;

Personally, I’d suspect that the business will review and revise this number from time to time, to keep up with inflation; so we might end up needing a database table to store the current threshold, plus a date range for which the threshold applies. I’d then add an interface on top of this table so that queries and procedures don’t need to know how to get the current threshold. We can retrofit this later by changing ceo_salary_threshold into a function instead of a constant. That’s a bit beyond the scope of this exercise, however.

3. Formatting Rules

The rule about formatting an employee name as “LAST,FIRST” is duplicated in a comment and in the SELECT statement; and chances are it will be required elsewhere in the application as well. My preferred method for creating a SPOD for this sort of business rule used to be to move the implementation into a view, e.g.:

CREATE VIEW formatted_employees AS
SELECT employees.*,
       employees.last_name || ',' || employees.first_name
       AS full_name
FROM   employees;
COMMENT ON COLUMN formatted_employees.full_name
IS 'Full name: LAST COMMA FIRST (ReqDoc 123.A.47)';

This view is what I like to call a “formatting” view: it is only allowed to query one table, it contains no WHERE, GROUP BY or HAVING clauses, and it selects all the columns from the table. The view can be used almost anywhere the table may be used. It adds additional columns that format the data in various ways. If need be, we can even add INSTEAD OF triggers to handle inserts/updates on the derived columns – if the business rules make the conversion from derived-to-underlying-column well defined.

So, now we can redefine the cursor as:

CURSOR emps_in_dept_cur
IS
   SELECT employee_id, salary, full_name AS lname
     FROM formatted_employees
    WHERE department_id = department_id_in;

Notice that I don’t call the column “last_comma_first” or anything like that – that would again be hard-coding the business rule, which would then be replicated throughout the application. In Oracle 11g, however, I think it might be better to create virtual columns on the table instead:

ALTER TABLE employees ADD (
full_name VARCHAR2(100)
GENERATED ALWAYS AS (last_name || ',' || first_name) VIRTUAL
);
COMMENT ON COLUMN employees.full_name
IS 'Full name: LAST COMMA FIRST (ReqDoc 123.A.47)';
CURSOR emps_in_dept_cur
IS
   SELECT employee_id, salary, full_name AS lname
     FROM employees
    WHERE department_id = department_id_in;

The virtual column can have its own stats, and even an index if needed for querying.

Another option would be to create a function that does this formatting:

CREATE FUNCTION employee_full_name
   (last_name  IN employees.last_name%TYPE,
    first_name IN employees.first_name%TYPE)
RETURN VARCHAR2 DETERMINISTIC IS
--Full name: LAST COMMA FIRST (ReqDoc 123.A.47)
BEGIN
   RETURN last_name || ',' || first_name;
END employee_full_name;

We could call this function from the procedure or the view, but if we’re on 11g there’s no reason we can’t create a virtual column on it:

ALTER TABLE employees ADD (
full_name VARCHAR2(100)
GENERATED ALWAYS
AS (employee_full_name(last_name,first_name)) VIRTUAL
);

Another advantage to using the view or a virtual column is that we can now remove the “VARCHAR2 (100)” from the variable declaration, e.g.:

l_name   employees.full_name%TYPE;

4. Cursor Parameter

The cursor refers directly to the parameter to the procedure, which is a no-no – this couples the cursor too much with the procedure, i.e. we can’t re-use it elsewhere unless we always define a variable “department_id_in”. Instead, we should use a cursor parameter:

CURSOR emps_in_dept_cur
   (department_id_in IN employees.department_id%TYPE)
IS
   SELECT employee_id, salary, full_name AS lname
     FROM employees
    WHERE department_id = emps_in_dept_cur.department_id_in;

The addition of the context “emps_in_dept_cur.” is not strictly necessary, but it is good practice to define the scope of all variables so that unrelated changes (e.g. the addition of a column called “department_id_in”) don’t change the code.

5. Cursor Row Type

What if we need to add 10 more columns to the cursor? At the moment we’re adding one more variable for each column of the cursor, and specifying it three times (variable declaration, cursor SELECT clause, and the FETCH INTO). We can reduce this to just once by declaring a cursor row type instead:

PROCEDURE process_employee
   (department_id_in IN employees.department_id%TYPE)
IS
   CURSOR emps_in_dept_cur
      (department_id_in IN employees.department_id%TYPE)
   IS
      SELECT employee_id, salary, full_name lname
        FROM employees
       WHERE department_id = emps_in_dept_cur.department_id_in;
   TYPE emps_in_dept_cur_type IS emps_in_dept_cur%ROWTYPE;
   emp emps_in_dept_cur_type;
BEGIN
   OPEN emps_in_dept_cur;
   LOOP
      FETCH emps_in_dept_cur
      INTO emp;
...

6. Don’t COMMIT

Procedures should rarely COMMIT (there are very few exceptions to this rule, e.g. procedures declared as autonomous transactions). Transactional control should be left to the calling process – this process might need to be done along with a number of other changes elsewhere, and we would want to either COMMIT or ROLLBACK all the changes together as one transaction. What if the next procedure raised an error and we had to rollback? Our system would be left in an inconsistent state.

7. Error Package

That RAISE_APPLICATION_ERROR hard-codes an error code and an error message. What if we type the error number wrong somewhere? If the calling process handles ORA-20907 in some fashion, but we mistype it as -20908 in one procedure, the calling process will not handle it.
We could declare an exception instead, e.g. in a global package specification:

CREATE PACKAGE employee_exception AS
invalid_dept_id EXCEPTION;
PRAGMA EXCEPTION_INIT (invalid_dept_id, -20907);
END employee_exception;

Now, our exception handler can raise just the one exception:

EXCEPTION WHEN NO_DATA_FOUND THEN
   RAISE employee_exception.invalid_dept_id;

However, we’ve now lost the error message. It would be better to create an error-handling package instead:

CREATE PACKAGE employee_error AS
 invalid_error_no CONSTANT NUMBER := -20000;
 invalid_error_no_exception EXCEPTION;
 PRAGMA EXCEPTION_INIT (invalid_error_no_exception, -20000);
 invalid_dept_id CONSTANT NUMBER := -20907;
 invalid_dept_id_exception EXCEPTION;
 PRAGMA EXCEPTION_INIT (invalid_dept_id_exception, -20907);
 PROCEDURE raise_exception (error_no IN NUMBER);
END employee_exception;

CREATE PACKAGE BODY employee_error AS
 PROCEDURE raise_exception (error_no IN NUMBER) IS
 BEGIN
 CASE error_no
 WHEN invalid_dept_id
 THEN RAISE_APPLICATION_ERROR(invalid_dept_id, 'Invalid department ID');
 ELSE RAISE_APPLICATION_ERROR(invalid_error_no, 'Bug: invalid error number');
 END;
 END message;
END employee_exception;

EDIT: PRAGMA EXCEPTION_INIT only accepts literal numbers for its second parameter (or else you get PLS-00702 at compile time) – fixed accordingly

Now, our exception handler is nicely modular:

EXCEPTION WHEN NO_DATA_FOUND THEN
employee_error.raise_exception(employee_error.invalid_dept_id);

So now, our code looks like this:

PROCEDURE process_employee (department_id_in IN employees.department_id%TYPE)
IS
   CURSOR emps_in_dept_cur (department_id_in IN employees.department_id%TYPE)
   IS
      SELECT employee_id, salary, full_name lname
        FROM employees
       WHERE department_id = emps_in_dept_cur.department_id_in;
   TYPE emps_in_dept_cur_type IS emps_in_dept_cur%ROWTYPE;
   emp emps_in_dept_cur_type;
BEGIN
   OPEN emps_in_dept_cur;
   LOOP
      FETCH emps_in_dept_cur
      INTO emp;

      IF emp.salary > employee_constant.ceo_salary_threshold THEN adjust_comp_for_ceo (emp.salary);
      ELSE analyze_compensation (emp.employee_id, emp.salary, employee_constant.ceo_salary_threshold); END IF;
      EXIT WHEN emps_in_dept_cur%NOTFOUND;
   END LOOP;
EXCEPTION WHEN NO_DATA_FOUND THEN
   employee_error.raise_exception(employee_error.invalid_dept_id);
END;

One final change that one might make here is to move the SQL query right out of the procedure and use a ref cursor instead, supplied by a central “employee_cursor” package.

There are probably plenty of other changes we could make to improve the maintainability of this code further.
What do you think?


Another way to learn something new every day

The PL/SQL ChallengeMy morning routine for the last few weeks has had a small addition – just a few minutes to log on The PL/SQL Challenge at 8am (WST) and answer a quiz question.

This is a great way to test the depth and breadth of your knowledge of Oracle PL/SQL. Whether you’re a beginner in the field or have years of experience, I’m certain you’ll occasionally learn something new – plus, it’s completely free!

Some of the questions are fairly general, others are trivial; while they’re never “trick” questions, they can still trip you up if you’re not observant!

I think I have an unfair advantage, being in this timezone – early morning is when I am most alert and ready to tackle any challenge.

A couple of things I’ve learned recently:

  • UTL_FILE.putf can be used if you want to use C-style “printf” syntax, e.g. '\n'
  • SIGNTYPE is a standard subtype that can only take values of NULL, -1, 0 or 1

(I’ve learned a few other things as well, but I’ve forgotten what they were :|)


One insert or multiple inserts?


A colleague asked me a trick* question today which I failed 🙂

* whether it was a “trick” question is probably in the eye of the beholder, though…

“What are the differences, if any, between the following two approaches to inserting multiple rows (assume v1 and v2 have different values):

INSERT INTO mytable (a)
SELECT :v1 FROM DUAL;
INSERT INTO mytable (a)
SELECT :v2 FROM DUAL;

OR:

INSERT INTO mytable (a)
SELECT :v1 FROM DUAL
UNION
SELECT :v2 FROM DUAL;

I quickly answered:

  1. The first approach requires one extra parse;
  2. The second approach requires a Sort Distinct (albeit of only two rows)
  3. A UNION ALL would be better, which would not require a Sort, nor would require the extra parse.

My colleague responded, there’s one very important thing I missed: Triggers! The first approach would execute the BEFORE STATEMENT and AFTER STATEMENT triggers once for each row. The second approach would only execute these triggers once.

What’s sad is that the application we’re working on has row-level logic in the BEFORE/AFTER statement triggers. If we try to optimise the code to insert more than one row in one INSERT statement, the application only runs the row-level logic for the first row inserted. Bad code! Very very bad!