So I followed the great advice here to use the new COLLECT function in Oracle 10g to solve the common string-aggregation-in-SQL problem. I chose this solution over the others because it sorts the results as well, which was a user requirement. This is because the COLLECT function optionally takes an ORDER BY clause – even though the 10g documentation forgot to mention this. The 11g documentation was updated to include the additional syntax option, as well as the fact that COLLECT supports the DISTINCT (or UNIQUE) keyword as well to remove duplicates – testing indicates that this works in 10g as well.
This means that if I define a suitable type varchar2_ntt
and a suitable function ntt_to_string
, I can write a query like this:
SELECT dept,
ntt_to_string(
CAST(
COLLECT(ename ORDER BY ename)
AS varchar2_ntt)
) AS ename_list
FROM emp
GROUP BY dept;
That works fine. But now I want to combine this with some analytic functions, like this:
SELECT DISTINCT
dept,
ntt_to_string(
CAST(
COLLECT(ename ORDER BY ename)
AS varchar2_ntt)
) AS ename_list,
FIRST_VALUE(ename)
OVER (PARTITION BY dept
ORDER BY sal DESC) AS topdog
FROM emp;
This doesn’t work because (a) the COLLECT requires a GROUP BY; and (b) the analytic function cannot be used along with a GROUP BY. What I’d expect to be able to do is use an analytic version of the COLLECT:
SELECT DISTINCT
dept,
ntt_to_string(
CAST(
COLLECT(ename
PARTITION BY dept
ORDER BY ename)
AS varchar2_ntt)
) AS ename_list,
FIRST_VALUE(ename)
OVER (PARTITION BY dept
ORDER BY sal DESC) AS topdogFROM emp;
Us SQL developers are never satisfied, are we?
At this stage my best solution for this is to use a CTE:
WITH q AS (SELECT dept, ename, sal FROM emp)
SELECT DISTINCT
dept,
(SELECT ntt_to_string(
CAST(
COLLECT(q2.ename ORDER BY q2.ename)
AS varchar2_ntt)
)
FROM q q2
WHERE q2.dept = q.dept
) AS ename_list,
FIRST_VALUE(ename)
OVER (PARTITION BY dept
ORDER BY sal DESC) AS topdog
FROM q;
DECLARE
lexists BOOLEAN;
lfile_len NUMBER;
lblocksize NUMBER;
BEGIN
UTL_FILE.fgetattr(
location => 'a',
filename => 'b',
exists => lexists,
file_length => lfile_len,
blocksize => lblocksize);
END;
I was trying to use this procedure in a 9i database and kept getting:
PLS-00103: Encountered the symbol ">"...
– complaining about line 8 (the “exists” parameter). If I removed the parameter names, it worked fine. Something was wrong with that “exists” parameter name.
In the 9i and 10g documentation:
UTL_FILE.FGETATTR(
location IN VARCHAR2,
filename IN VARCHAR2,
exists OUT BOOLEAN,
file_length OUT NUMBER,
blocksize OUT NUMBER);
In the 11g documentation:
UTL_FILE.FGETATTR(
location IN VARCHAR2,
filename IN VARCHAR2,
fexists OUT BOOLEAN,
file_length OUT NUMBER,
blocksize OUT BINARY_INTEGER);
Ah – the parameter was actually called “fexists”. Ok. Try again:
PLS-00306: wrong number or types of arguments in call to 'FGETATTR'
Aaarrgh. Time for more googling.
According to psoug:
utl_file.fgetattr(
location IN VARCHAR2,
filename IN VARCHAR2,
fexists OUT BOOLEAN,
file_length OUT NUMBER,
block_size OUT BINARY_INTEGER);
Thank goodness I’ve got access to more than just the Oracle docs!
I learnt something new today (thanks to Steven) about TRIM – in the past I’ve used RTRIM and LTRIM if I only want to trim from the left or right ends of a string, but I was not aware (or had forgotten) that SQL’s TRIM function allows this to be specified.
Oracle’s documentation of TRIM, however, has a somewhat confusing example:
“This example trims leading zeros from the hire date of the employees in the hr schema:
SELECT employee_id,
TO_CHAR(TRIM(LEADING 0 FROM hire_date))
FROM employees
WHERE department_id = 60
ORDER BY employee_id;
EMPLOYEE_ID TO_CHAR(T
----------- ---------
103 3-JAN-90
104 21-MAY-91
105 25-JUN-97
106 5-FEB-98
107 7-FEB-99
As an example of using the TRIM function the above code is adequate; however, why does it call the TO_CHAR function at that point? After all, TO_CHAR is only useful if the datatype of its argument is not a character string – but in this case, it’s a string already – TRIM always returns a string.
Not only does the example perform a useless datatype conversion, it involves an implicit conversion – from date to string. The expression would be much better like this:
TRIM(LEADING 0 FROM TO_CHAR(hire_date))
Better – but still not perfect. Sure, we should probably specify the date format so the code is a bit more robust (what if the session has a default format of ‘MM/DD/YY’?), but we can see yet another implicit data type conversion: “LEADING 0“. The only purpose for TRIM is to remove characters. Sure, you can put in zero if you want, but it’s only going to be converted to a character, ‘0’ anyway – so why not make it obvious to the next coder who comes along:
TRIM(LEADING '0' FROM TO_CHAR(hire_date,'DD/MM/YY'))
There, much better! I think the above expression is much clearer about what it will actually do – i.e. it converts a date to a string of characters, and then removes any leading ‘0’ characters.
If, by some great random cosmic chance, you are a reader of this blog, but not of Tom Kyte‘s, then you would have missed this post:
NO_DATA_NEEDED – something I learned recently
It appears to have been documented in the 9i documentation, complete with spelling error:
ORA-06548, 00000, "no more rows needed"
Cause: The caller of a pipelined function does not
need more rows to be produced by the pipelined
function.
Action: Catch the NO_DATA_NEEDED exception is an
exception handling block.
Mind you, it’s not all that obvious since if the pipelined function does not handle the exception, nothing goes wrong – the exception is never raised by the calling SQL statement. It’s not obvious when ORA-06548 would ever be raised.
EDIT:
Looks like ORA-06548 can appear in the error stack.
I just noticed the difference – I was wondering why I couldn’t see the virtual columns (i.e. columns that support function-based indexes) on a table, and it was because I was querying USER_TAB_COLUMNS. Once I used USER_TAB_COLS instead there they were.
The difference is noted in the documentation (Oracle 10g Release 2 (10.2)) for USER_TAB_COLS and ALL_TAB_COLS but not for USER_TAB_COLUMNS or ALL_TAB_COLUMNS, unfortunately.
Oracle Database SQL Reference 10g Release 1 (10.1): Datatype Comparison Rules – Numeric Values
“A larger value is considered greater than a smaller one.”
“All negative numbers are less than zero and all positive numbers.”
“-1 is less than 100; -100 is less than -1.”
On first reading these statements seem obvious. The more you read them, however, the more they take on a deeper meaning, a deeper structure and beauty that transcends this, sublimates that and begins subtly but inexorably to change the way you view the world. You start to question their simplicity – are they perhaps over-simplifications of a more complex reality? Well, perhaps not.
I guess they had to include these statements for completeness, since later they get into the more complicated cases of comparison rules for dates and strings, which are not (necessarily) so obvious. For example, I haven’t come across anyone who thinks 12 April 1961 is greater than 20 July 1969, but I’m sure there are some.
PL/SQL User’s Guide and Reference (9.2 and 10.2): “NULL Statement”
“The NULL statement and Boolean value NULL are unrelated.”
I can understand that this is to draw a distinction between “NULL” as a procedural statement and “NULL” as a literal. But why is NULL specifically identified as Boolean? So, the NULL statement is somehow related to NULL strings, NULL numbers, and NULL dates?
(I won’t even mention the problem with calling NULL a value, something which is prevalent throughout the literature, including the SQL standard.)
How many times have you seen ''
used for a NULL string?
Oracle 9i SQL Reference Release 2 (9.2) – “Basic Elements of Oracle SQL, 5 of 10: Nulls”
“…Oracle currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls.”