So I followed the great advice here to use the new COLLECT function in Oracle 10g to solve the common string-aggregation-in-SQL problem. I chose this solution over the others because it sorts the results as well, which was a user requirement. This is because the COLLECT function optionally takes an ORDER BY clause – even though the 10g documentation forgot to mention this. The 11g documentation was updated to include the additional syntax option, as well as the fact that COLLECT supports the DISTINCT (or UNIQUE) keyword as well to remove duplicates – testing indicates that this works in 10g as well.
This means that if I define a suitable type varchar2_ntt
and a suitable function ntt_to_string
, I can write a query like this:
SELECT dept,
ntt_to_string(
CAST(
COLLECT(ename ORDER BY ename)
AS varchar2_ntt)
) AS ename_list
FROM emp
GROUP BY dept;
That works fine. But now I want to combine this with some analytic functions, like this:
SELECT DISTINCT
dept,
ntt_to_string(
CAST(
COLLECT(ename ORDER BY ename)
AS varchar2_ntt)
) AS ename_list,
FIRST_VALUE(ename)
OVER (PARTITION BY dept
ORDER BY sal DESC) AS topdog
FROM emp;
This doesn’t work because (a) the COLLECT requires a GROUP BY; and (b) the analytic function cannot be used along with a GROUP BY. What I’d expect to be able to do is use an analytic version of the COLLECT:
SELECT DISTINCT
dept,
ntt_to_string(
CAST(
COLLECT(ename
PARTITION BY dept
ORDER BY ename)
AS varchar2_ntt)
) AS ename_list,
FIRST_VALUE(ename)
OVER (PARTITION BY dept
ORDER BY sal DESC) AS topdogFROM emp;
Us SQL developers are never satisfied, are we?
At this stage my best solution for this is to use a CTE:
WITH q AS (SELECT dept, ename, sal FROM emp)
SELECT DISTINCT
dept,
(SELECT ntt_to_string(
CAST(
COLLECT(q2.ename ORDER BY q2.ename)
AS varchar2_ntt)
)
FROM q q2
WHERE q2.dept = q.dept
) AS ename_list,
FIRST_VALUE(ename)
OVER (PARTITION BY dept
ORDER BY sal DESC) AS topdog
FROM q;
Want to purge all the recyclebins for all users in your database?
Can’t (or don’t want to) log in as each user individually to run this, one by one:
PURGE RECYCLEBIN
You could log in as SYSDBA and run:
PURGE DBA_RECYCLEBIN
But, what if you don’t have SYSDBA access? (Yes, this has happened to me, with a client-supplied VM – they told me the SYSTEM password but not the SYS password)
You can use the PURGE TABLESPACE x USER y option – run this to generate the commands to run, then run the output as a script:
SELECT DISTINCT
REPLACE(REPLACE('PURGE TABLESPACE "#TS#" USER "#OWNER#";'
,'#TS#',ts_name)
,'#OWNER#',owner)
FROM dba_recyclebin
WHERE ts_name IS NOT NULL;
PURGE TABLESPACE "USER_DATA" USER "SCOTT";
PURGE TABLESPACE "USER_DATA" USER "LARRY";
WARNING: this command will result in irreversible loss of data for all users on your database.
Tested on Oracle 10gR2.
Can you solve this problem in SQL – i.e. without resorting to a procedural solution?
SQL combine multiple identifiers to create a group id for duplicate records
“I have a data extract with three different identifiers: A, B, C
Each identifier may appear in more than one row, and each row may have one or more of these three identifiers (i.e the column is populated or null).
I want to be able to group all records that have any combination of either A, B or C in common and assign them the same group id.
Extract table showing what the eventual groups should be:
A | B | C | Group
==== ==== ==== =====
p NULL NULL 1
p r NULL 1
q NULL NULL 2
NULL r NULL 1
NULL NULL s 2
q NULL s 2
So, the input data is a table with three columns (A, B, and C), some of which are NULL. The output is a third column, “Group”, which will be assigned a number which classifies the row into a “group”. Each group will be distinct in that none of its members will have a value in A, B or C that appears in any row in any other group.
This question is fascinating because it cannot be solved, I believe, without some form of iteration. If I get the row (p) along with (q), they are in two different groups; but, if I add the row (p,q), all of a sudden my original rows are now in the same group along with the new row.
The solution will probably have to examine each row in consideration with the entire record set – an operation of O(n^2), if my understanding of CS theory is correct. I suspect a solution using at least a CTE and/or the MODEL clause will be required.
UPDATE:
An elegant solution, using a hierarchical query and Oracle’s CONNECT_BY_ROOT function, has been posted by Vincent Malgrat.
How many times have you tried something, got either an error or unexpected results, and decided what you were trying to do was not possible? Have you later on discovered someone quietly doing the impossible?
I think this phenomenon is a form of the “correlation-implies-causation” fallacy.
Unfortunately, this seems to happen too often, if the kind of questions I see quite often are any guide. A recent example is: “Why cannot I select from more than one table in Oracle?”. Here, the author seems to have followed the following thought process:
- “SELECT * FROM table1” returns some rows.
- “SELECT * FROM table1, table2” returns no rows.
- Therefore, you can’t query more than one table in one SQL statement in Oracle.
In this case, the writer had not realised that table2 had no rows in it; what complicated things somewhat was that in one session, the second query was returning rows – because he’d inserted some rows into table2 in that session but hadn’t issued a COMMIT, so those rows were not visible by other sessions.
For a person inexperienced in SQL or Oracle, this sort of mistake is forgivable; but I suspect we all make this sort of mistake quite often. I know I have!
When trying something new, it takes diligent research and testing to determine whether one’s approach is simply wrong, or if unrelated factors (e.g. getting the syntax wrong, or the environment is not set up correctly) are causing failure. This gets more tiresome and frustrating (a “gumption trap”, in Persig‘s parlance) when one was halfway through solving some other problem, and this unexpected problem gets in the way.
Sometimes you just have to go to bed and see if it becomes clearer the next day. If the problem persists, ask a question on StackOverflow!
P.S. if a Google search reveals “doing X is impossible”, ask “Why?”
Steven Feuerstein lists seven excellent “Golden Rules” in his presentation (via Eddie Awad) and says “Don’t repeat anything. Aim for a Single Point of Definition for every aspect of your application – formulas, business rules, magic values, SQL statements.” giving the following code as exhibit A:
I’m guessing in his presentation he spoke about various things that could be done to improve this code, but they’re not in the PDF; so I’d like to give it a go myself and see how much we can improve the maintainability of this code by reducing hard-coding.
1. Type Declarations
Instead of declaring parameters and variables as NUMBER, VARCHAR2 etc, these should use the %TYPE operator so that they are automatically synchronized with the datatype from the table columns they represent:
PROCEDURE process_employee
(department_id_in IN employees.department_id%TYPE)
IS
l_id employees.employee_id%TYPE;
l_salary employees.salary%TYPE;
...
l_name, however, is not based on any table column we know of at this point; so there is no %TYPE we can use for it. But bear with me, we’ll fix this later.
2. Magic Values
This one’s a no-brainer: that “10000000” is obviously a magic value that some bean-counter decided was the correct threshold for the CEO’s salary. Whatever.
You might define this as a constant defined in a global package specification, e.g.
CREATE PACKAGE employee_constant AS
ceo_salary_threshold CONSTANT employees.salary%TYPE := 10000000;
END employee_constant;
Personally, I’d suspect that the business will review and revise this number from time to time, to keep up with inflation; so we might end up needing a database table to store the current threshold, plus a date range for which the threshold applies. I’d then add an interface on top of this table so that queries and procedures don’t need to know how to get the current threshold. We can retrofit this later by changing ceo_salary_threshold into a function instead of a constant. That’s a bit beyond the scope of this exercise, however.
3. Formatting Rules
The rule about formatting an employee name as “LAST,FIRST” is duplicated in a comment and in the SELECT statement; and chances are it will be required elsewhere in the application as well. My preferred method for creating a SPOD for this sort of business rule used to be to move the implementation into a view, e.g.:
CREATE VIEW formatted_employees AS
SELECT employees.*,
employees.last_name || ',' || employees.first_name
AS full_name
FROM employees;
COMMENT ON COLUMN formatted_employees.full_name
IS 'Full name: LAST COMMA FIRST (ReqDoc 123.A.47)';
This view is what I like to call a “formatting” view: it is only allowed to query one table, it contains no WHERE, GROUP BY or HAVING clauses, and it selects all the columns from the table. The view can be used almost anywhere the table may be used. It adds additional columns that format the data in various ways. If need be, we can even add INSTEAD OF triggers to handle inserts/updates on the derived columns – if the business rules make the conversion from derived-to-underlying-column well defined.
So, now we can redefine the cursor as:
CURSOR emps_in_dept_cur
IS
SELECT employee_id, salary, full_name AS lname
FROM formatted_employees
WHERE department_id = department_id_in;
Notice that I don’t call the column “last_comma_first” or anything like that – that would again be hard-coding the business rule, which would then be replicated throughout the application. In Oracle 11g, however, I think it might be better to create virtual columns on the table instead:
ALTER TABLE employees ADD (
full_name VARCHAR2(100)
GENERATED ALWAYS AS (last_name || ',' || first_name) VIRTUAL
);
COMMENT ON COLUMN employees.full_name
IS 'Full name: LAST COMMA FIRST (ReqDoc 123.A.47)';
CURSOR emps_in_dept_cur
IS
SELECT employee_id, salary, full_name AS lname
FROM employees
WHERE department_id = department_id_in;
The virtual column can have its own stats, and even an index if needed for querying.
Another option would be to create a function that does this formatting:
CREATE FUNCTION employee_full_name
(last_name IN employees.last_name%TYPE,
first_name IN employees.first_name%TYPE)
RETURN VARCHAR2 DETERMINISTIC IS
--Full name: LAST COMMA FIRST (ReqDoc 123.A.47)
BEGIN
RETURN last_name || ',' || first_name;
END employee_full_name;
We could call this function from the procedure or the view, but if we’re on 11g there’s no reason we can’t create a virtual column on it:
ALTER TABLE employees ADD (
full_name VARCHAR2(100)
GENERATED ALWAYS
AS (employee_full_name(last_name,first_name)) VIRTUAL
);
Another advantage to using the view or a virtual column is that we can now remove the “VARCHAR2 (100)” from the variable declaration, e.g.:
l_name employees.full_name%TYPE;
4. Cursor Parameter
The cursor refers directly to the parameter to the procedure, which is a no-no – this couples the cursor too much with the procedure, i.e. we can’t re-use it elsewhere unless we always define a variable “department_id_in”. Instead, we should use a cursor parameter:
CURSOR emps_in_dept_cur
(department_id_in IN employees.department_id%TYPE)
IS
SELECT employee_id, salary, full_name AS lname
FROM employees
WHERE department_id = emps_in_dept_cur.department_id_in;
The addition of the context “emps_in_dept_cur.” is not strictly necessary, but it is good practice to define the scope of all variables so that unrelated changes (e.g. the addition of a column called “department_id_in”) don’t change the code.
5. Cursor Row Type
What if we need to add 10 more columns to the cursor? At the moment we’re adding one more variable for each column of the cursor, and specifying it three times (variable declaration, cursor SELECT clause, and the FETCH INTO). We can reduce this to just once by declaring a cursor row type instead:
PROCEDURE process_employee
(department_id_in IN employees.department_id%TYPE)
IS
CURSOR emps_in_dept_cur
(department_id_in IN employees.department_id%TYPE)
IS
SELECT employee_id, salary, full_name lname
FROM employees
WHERE department_id = emps_in_dept_cur.department_id_in;
TYPE emps_in_dept_cur_type IS emps_in_dept_cur%ROWTYPE;
emp emps_in_dept_cur_type;
BEGIN
OPEN emps_in_dept_cur;
LOOP
FETCH emps_in_dept_cur
INTO emp;
...
6. Don’t COMMIT
Procedures should rarely COMMIT (there are very few exceptions to this rule, e.g. procedures declared as autonomous transactions). Transactional control should be left to the calling process – this process might need to be done along with a number of other changes elsewhere, and we would want to either COMMIT or ROLLBACK all the changes together as one transaction. What if the next procedure raised an error and we had to rollback? Our system would be left in an inconsistent state.
7. Error Package
That RAISE_APPLICATION_ERROR hard-codes an error code and an error message. What if we type the error number wrong somewhere? If the calling process handles ORA-20907 in some fashion, but we mistype it as -20908 in one procedure, the calling process will not handle it.
We could declare an exception instead, e.g. in a global package specification:
CREATE PACKAGE employee_exception AS
invalid_dept_id EXCEPTION;
PRAGMA EXCEPTION_INIT (invalid_dept_id, -20907);
END employee_exception;
Now, our exception handler can raise just the one exception:
EXCEPTION WHEN NO_DATA_FOUND THEN
RAISE employee_exception.invalid_dept_id;
However, we’ve now lost the error message. It would be better to create an error-handling package instead:
CREATE PACKAGE employee_error AS
invalid_error_no CONSTANT NUMBER := -20000;
invalid_error_no_exception EXCEPTION;
PRAGMA EXCEPTION_INIT (invalid_error_no_exception, -20000);
invalid_dept_id CONSTANT NUMBER := -20907;
invalid_dept_id_exception EXCEPTION;
PRAGMA EXCEPTION_INIT (invalid_dept_id_exception, -20907);
PROCEDURE raise_exception (error_no IN NUMBER);
END employee_exception;
CREATE PACKAGE BODY employee_error AS
PROCEDURE raise_exception (error_no IN NUMBER) IS
BEGIN
CASE error_no
WHEN invalid_dept_id
THEN RAISE_APPLICATION_ERROR(invalid_dept_id, 'Invalid department ID');
ELSE RAISE_APPLICATION_ERROR(invalid_error_no, 'Bug: invalid error number');
END;
END message;
END employee_exception;
EDIT: PRAGMA EXCEPTION_INIT only accepts literal numbers for its second parameter (or else you get PLS-00702 at compile time) – fixed accordingly
Now, our exception handler is nicely modular:
EXCEPTION WHEN NO_DATA_FOUND THEN
employee_error.raise_exception(employee_error.invalid_dept_id);
So now, our code looks like this:
PROCEDURE process_employee (department_id_in IN employees.department_id%TYPE)
IS
CURSOR emps_in_dept_cur (department_id_in IN employees.department_id%TYPE)
IS
SELECT employee_id, salary, full_name lname
FROM employees
WHERE department_id = emps_in_dept_cur.department_id_in;
TYPE emps_in_dept_cur_type IS emps_in_dept_cur%ROWTYPE;
emp emps_in_dept_cur_type;
BEGIN
OPEN emps_in_dept_cur;
LOOP
FETCH emps_in_dept_cur
INTO emp;
IF emp.salary > employee_constant.ceo_salary_threshold THEN adjust_comp_for_ceo (emp.salary);
ELSE analyze_compensation (emp.employee_id, emp.salary, employee_constant.ceo_salary_threshold); END IF;
EXIT WHEN emps_in_dept_cur%NOTFOUND;
END LOOP;
EXCEPTION WHEN NO_DATA_FOUND THEN
employee_error.raise_exception(employee_error.invalid_dept_id);
END;
One final change that one might make here is to move the SQL query right out of the procedure and use a ref cursor instead, supplied by a central “employee_cursor” package.
There are probably plenty of other changes we could make to improve the maintainability of this code further.
What do you think?
My code worked perfectly fine in Oracle 9i, but after our upgrade to 11g all of a sudden it was failing with ORA-01481 “Invalid number format model”.
The cause? A mixed-up number format model – I’d accidentally mixed locale-specific (“.“) and locale-safe (“G“) codes in the number format. I don’t know if this is due to a bug in 9i or an enhancement in 11g.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0
- 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application
Testing options
SQL> select to_char(12345.678,'99G990.0') from dual;
select to_char(12345.678,'99G990.0') from dual
*
ERROR at line 1:
ORA-01481: invalid number format model
SQL> select to_char(12345.678,'99,990.0') from dual;
TO_CHAR(1
---------
12,345.7
SQL> select to_char(12345.678,'99G990D0') from dual;
TO_CHAR(1
---------
12,345.7
“Action: Consult the manual“… in retrospect, not of much help in this instance, unfortunately – however, these sorts of problems may usually be worked out by experimentation.