Never satisfied

So I followed the great advice here to use the new COLLECT function in Oracle 10g to solve the common string-aggregation-in-SQL problem. I chose this solution over the others because it sorts the results as well, which was a user requirement. This is because the COLLECT function optionally takes an ORDER BY clause – even though the 10g documentation forgot to mention this. The 11g documentation was updated to include the additional syntax option, as well as the fact that COLLECT supports the DISTINCT (or UNIQUE) keyword as well to remove duplicates – testing indicates that this works in 10g as well.

This means that if I define a suitable type varchar2_ntt and a suitable function ntt_to_string, I can write a query like this:

SELECT dept,
       ntt_to_string(
          CAST(
            COLLECT(ename ORDER BY ename)
          AS varchar2_ntt)
       ) AS ename_list
FROM emp
GROUP BY dept;

That works fine. But now I want to combine this with some analytic functions, like this:

SELECT DISTINCT
       dept,
       ntt_to_string(
          CAST(
            COLLECT(ename ORDER BY ename)
          AS varchar2_ntt)
       ) AS ename_list,
       FIRST_VALUE(ename)
          OVER (PARTITION BY dept
                ORDER BY sal DESC) AS topdog
FROM emp;

This doesn’t work because (a) the COLLECT requires a GROUP BY; and (b) the analytic function cannot be used along with a GROUP BY. What I’d expect to be able to do is use an analytic version of the COLLECT:

SELECT DISTINCT
       dept,
       ntt_to_string(
          CAST(
            COLLECT(ename
                    PARTITION BY dept
                    ORDER BY ename)
          AS varchar2_ntt)
       ) AS ename_list,
       FIRST_VALUE(ename)
          OVER (PARTITION BY dept
                ORDER BY sal DESC) AS topdogFROM emp;

Us SQL developers are never satisfied, are we?

At this stage my best solution for this is to use a CTE:

WITH q AS (SELECT dept, ename, sal FROM emp)
SELECT DISTINCT
       dept,
       (SELECT ntt_to_string(
                 CAST(
                   COLLECT(q2.ename ORDER BY q2.ename)
                 AS varchar2_ntt)
               )
        FROM q q2
        WHERE q2.dept = q.dept
       ) AS ename_list,
       FIRST_VALUE(ename)
          OVER (PARTITION BY dept
                ORDER BY sal DESC) AS topdog
FROM q;

UTL_FILE.fGetAttr

DECLARE
  lexists      BOOLEAN;
  lfile_len    NUMBER;
  lblocksize   NUMBER;
BEGIN
  UTL_FILE.fgetattr(
    location    => 'a',
    filename    => 'b',
    exists      => lexists,
    file_length => lfile_len,
    blocksize   => lblocksize);
END;

I was trying to use this procedure in a 9i database and kept getting:

PLS-00103: Encountered the symbol ">"...

– complaining about line 8 (the “exists” parameter). If I removed the parameter names, it worked fine. Something was wrong with that “exists” parameter name.
In the 9i and 10g documentation:

UTL_FILE.FGETATTR(
    location    IN VARCHAR2,
    filename    IN VARCHAR2,
    exists      OUT BOOLEAN,
    file_length OUT NUMBER,
    blocksize   OUT NUMBER);

In the 11g documentation:

UTL_FILE.FGETATTR(
    location    IN VARCHAR2,
    filename    IN VARCHAR2,
    fexists     OUT BOOLEAN,
    file_length OUT NUMBER,
    blocksize   OUT BINARY_INTEGER);

Ah – the parameter was actually called “fexists”. Ok. Try again:

PLS-00306: wrong number or types of arguments in call to 'FGETATTR'

Aaarrgh. Time for more googling.
According to psoug:

utl_file.fgetattr(
    location    IN  VARCHAR2,
    filename    IN  VARCHAR2,
    fexists     OUT BOOLEAN,
    file_length OUT NUMBER,
    block_size  OUT BINARY_INTEGER);

Thank goodness I’ve got access to more than just the Oracle docs!


A Good Bad Example

I learnt something new today (thanks to Steven) about TRIM – in the past I’ve used RTRIM and LTRIM if I only want to trim from the left or right ends of a string, but I was not aware (or had forgotten) that SQL’s TRIM function allows this to be specified.

Oracle’s documentation of TRIM, however, has a somewhat confusing example:

“This example trims leading zeros from the hire date of the employees in the hr schema:

SELECT employee_id,
TO_CHAR(TRIM(LEADING 0 FROM hire_date))
FROM employees
WHERE department_id = 60
ORDER BY employee_id;

EMPLOYEE_ID TO_CHAR(T
----------- ---------
103 3-JAN-90
104 21-MAY-91
105 25-JUN-97
106 5-FEB-98
107 7-FEB-99

As an example of using the TRIM function the above code is adequate; however, why does it call the TO_CHAR function at that point? After all, TO_CHAR is only useful if the datatype of its argument is not a character string – but in this case, it’s a string already – TRIM always returns a string.

Not only does the example perform a useless datatype conversion, it involves an implicit conversion – from date to string. The expression would be much better like this:

TRIM(LEADING 0 FROM TO_CHAR(hire_date))

Better – but still not perfect. Sure, we should probably specify the date format so the code is a bit more robust (what if the session has a default format of ‘MM/DD/YY’?), but we can see yet another implicit data type conversion: “LEADING 0“. The only purpose for TRIM is to remove characters. Sure, you can put in zero if you want, but it’s only going to be converted to a character, ‘0’ anyway – so why not make it obvious to the next coder who comes along:

TRIM(LEADING '0' FROM TO_CHAR(hire_date,'DD/MM/YY'))

There, much better! I think the above expression is much clearer about what it will actually do – i.e. it converts a date to a string of characters, and then removes any leading ‘0’ characters.


The exception we never knew we needed

If, by some great random cosmic chance, you are a reader of this blog, but not of Tom Kyte‘s, then you would have missed this post:

NO_DATA_NEEDED – something I learned recently

It appears to have been documented in the 9i documentation, complete with spelling error:

ORA-06548, 00000, "no more rows needed"
Cause:   The caller of a pipelined function does not
         need more rows to be produced by the pipelined
         function.
Action:  Catch the NO_DATA_NEEDED exception is an
         exception handling block.

Mind you, it’s not all that obvious since if the pipelined function does not handle the exception, nothing goes wrong – the exception is never raised by the calling SQL statement. It’s not obvious when ORA-06548 would ever be raised.

EDIT:

Looks like ORA-06548 can appear in the error stack.


Do you know the difference between USER_TAB_COLS & USER_TAB_COLUMNS?

I just noticed the difference – I was wondering why I couldn’t see the virtual columns (i.e. columns that support function-based indexes) on a table, and it was because I was querying USER_TAB_COLUMNS. Once I used USER_TAB_COLS instead there they were.

The difference is noted in the documentation (Oracle 10g Release 2 (10.2)) for USER_TAB_COLS and ALL_TAB_COLS but not for USER_TAB_COLUMNS or ALL_TAB_COLUMNS, unfortunately.


Oracle 101: The (Very) Basics

Oracle Database SQL Reference 10g Release 1 (10.1): Datatype Comparison Rules – Numeric Values

“A larger value is considered greater than a smaller one.”

“All negative numbers are less than zero and all positive numbers.”

“-1 is less than 100; -100 is less than -1.”

On first reading these statements seem obvious. The more you read them, however, the more they take on a deeper meaning, a deeper structure and beauty that transcends this, sublimates that and begins subtly but inexorably to change the way you view the world. You start to question their simplicity – are they perhaps over-simplifications of a more complex reality? Well, perhaps not.

I guess they had to include these statements for completeness, since later they get into the more complicated cases of comparison rules for dates and strings, which are not (necessarily) so obvious. For example, I haven’t come across anyone who thinks 12 April 1961 is greater than 20 July 1969, but I’m sure there are some.


Did you know that NULL is a “Boolean value”? Hmmm…

PL/SQL User’s Guide and Reference (9.2 and 10.2): “NULL Statement”

“The NULL statement and Boolean value NULL are unrelated.”

I can understand that this is to draw a distinction between “NULL” as a procedural statement and “NULL” as a literal. But why is NULL specifically identified as Boolean? So, the NULL statement is somehow related to NULL strings, NULL numbers, and NULL dates?

(I won’t even mention the problem with calling NULL a value, something which is prevalent throughout the literature, including the SQL standard.)