Tag: stackoverflow

A fascinating SQL problem

Can you solve this problem in SQL – i.e. without resorting to a procedural solution?

SQL combine multiple identifiers to create a group id for duplicate records

“I have a data extract with three different identifiers: A, B, C
Each identifier may appear in more than one row, and each row may have one or more of these three identifiers (i.e the column is populated or null).
I want to be able to group all records that have any combination of either A, B or C in common and assign them the same group id.
Extract table showing what the eventual groups should be:

A    | B    | C    | Group
====   ====   ====   =====
p      NULL   NULL   1
p      r      NULL   1
q      NULL   NULL   2
NULL   r      NULL   1
NULL   NULL   s      2
q      NULL   s      2

So, the input data is a table with three columns (A, B, and C), some of which are NULL. The output is a third column, “Group”, which will be assigned a number which classifies the row into a “group”. Each group will be distinct in that none of its members will have a value in A, B or C that appears in any row in any other group.

This question is fascinating because it cannot be solved, I believe, without some form of iteration. If I get the row (p) along with (q), they are in two different groups; but, if I add the row (p,q), all of a sudden my original rows are now in the same group along with the new row.

The solution will probably have to examine each row in consideration with the entire record set – an operation of O(n^2), if my understanding of CS theory is correct. I suspect a solution using at least a CTE and/or the MODEL clause will be required.

An elegant solution, using a hierarchical query and Oracle’s CONNECT_BY_ROOT function, has been posted by Vincent Malgrat.

If at first you don’t succeed… it’s impossible.

How many times have you tried something, got either an error or unexpected results, and decided what you were trying to do was not possible? Have you later on discovered someone quietly doing the impossible?

I think this phenomenon is a form of the “correlation-implies-causation” fallacy.

Unfortunately, this seems to happen too often, if the kind of questions I see quite often are any guide. A recent example is: “Why cannot I select from more than one table in Oracle?”. Here, the author seems to have followed the following thought process:

  1. “SELECT * FROM table1” returns some rows.
  2. “SELECT * FROM table1, table2” returns no rows.
  3. Therefore, you can’t query more than one table in one SQL statement in Oracle.

In this case, the writer had not realised that table2 had no rows in it; what complicated things somewhat was that in one session, the second query was returning rows – because he’d inserted some rows into table2 in that session but hadn’t issued a COMMIT, so those rows were not visible by other sessions.

For a person inexperienced in SQL or Oracle, this sort of mistake is forgivable; but I suspect we all make this sort of mistake quite often. I know I have!

When trying something new, it takes diligent research and testing to determine whether one’s approach is simply wrong, or if unrelated factors (e.g. getting the syntax wrong, or the environment is not set up correctly) are causing failure. This gets more tiresome and frustrating (a “gumption trap”, in Persig‘s parlance) when one was halfway through solving some other problem, and this unexpected problem gets in the way.

Sometimes you just have to go to bed and see if it becomes clearer the next day. If the problem persists, ask a question on StackOverflow!

P.S. if a Google search reveals “doing X is impossible”, ask “Why?”

Unit Test != Requirements

Are one or more usage examples enough to specify the requirements for something? For example:

rtrim('123000', '0'); would return '123'

No, as can be seen here: Oracle 8, SQL: RTRIM for string manipulation is not working as expected (Stackoverflow)

When I read that question I thought of TDD (Test Driven Development), something I think I should be doing more of. As said here, however, “Are tests sufficient documentation?  Very likely not, but they do form an important part of it.”

I’ve seen unit test cases used as a form of documentation. Generally they could be useful for this – to tell part of the story – but if they only consist of “enter this, expect that”, they will never be good enough to replace requirements documentation.

Footnote: How about the source code – is that sufficient as documentation? In one sense, yes – the source code is the best documentation of what the system does now. What’s lacking, however, is documentation of the business requirements – and this gap can be huge (see e.g. Agile Development and Requirements Management).

Querying for Contiguous Series

A deceptively simple question:

How to select the first continous group of rows using Oracle SQL (Stackoverflow)

The solution given by Malgrat works nicely. He generates a “gap” column which detects changes in the data, then uses a running total to restrict the results to the first “group”. I haven’t come up with a more elegant solution that doesn’t involve multiple table scans.


I’ve enjoyed reading and participating in StackOverflow for over a year now.

With the introduction of Area51, there is now the possibility of starting a new Q&A site for all things Oracle – as pointed out by Rob Van Wijk and Gary Myers. Sure there are other fora such as OTN, but they don’t have the kinds of features that make SO fun and self-moderating. SO is more like a cross between a Q&A forum and a wiki, with the addition of a democratic system of reputation points that allow good questions and answers to bubble up to the top.

If you agree, please head over to Area 51 – Oracle Databases and Follow it to voice your agreement.