Truncated Mean in Oracle

A colleague needed to get the average from a set of data, but disregarding the top 25% and the bottom 25%. We didn’t know of any builtin Oracle function that would do this, but a review of the wikipedia page for Average yielded the technical term: truncated (or trimmed) mean. So we searched the Oracle docs and Google for ways to implement this function and didn’t come up with anything very useful. There were some SQL Server scripts which would have required two or three passes over the dataset.

After browsing through the aggregate functions documentation, I hit upon the NTILE function which I’ve used before, and realised that was the answer. The NTILE function takes a set of ordered data and divides it evenly (or as evenly as possible) between a number of buckets. In our case, we wanted to discard the top 25% and bottom 25%, so we simply divide the set into 4 buckets and discard the 1st and the 4th buckets; then take the standard average of the remainder:

SELECT AVG(mystat)
FROM (
      SELECT mystat,
             NTILE(4) OVER (ORDER BY mystat) n
      FROM (SELECT mystat
            FROM mytable)
     )
WHERE n IN (2,3);

The benefit of this query is that it only does one pass over the data, and was easily modified to partition the data set into groups; a count was also added so that the average could be taken over the entire data set for any groups that had less than 4 items.

To get the truncated mean in groups, except for groups with <4 items (for which we’ll report the average over the entire group):

SELECT mycat, AVG(mystat)
FROM (
      SELECT mycat, mystat,
             NTILE(4) OVER (PARTITION BY mycat
                            ORDER BY mystat) n,
             COUNT(1) OVER (PARTITION BY mycat) c 
      FROM (SELECT mycat, mystat
            FROM mytable)
     )
WHERE n IN (2,3) OR c < 4
GROUP BY mycat
ORDER BY mycat;

Lesson learned today

Comment out all “DROP TABLE” commands in my scripts.

(I accidentally hit F5 when the focus was in the wrong window – which happened to contain a “DROP TABLE / CREATE TABLE” script – my Toad session goes and happily drops the table that I’d been gradually accumulating statistics into for the past 3 days – and no, there’s no flashback table in 9i)

At least I kept all my scripts – rerunning them all now…


How to Change the ORDER BY When You Can’t Modify the SQL

Is it possible to change the ORDER BY on a query, if the SQL is defined in a wrapped PL/SQL package? A colleague asked me this interesting question today – and the answer was quite simply, Yes and No.

If the SQL already has an ORDER BY, then the answer is No. Fortunately, in our case, the SQL didn’t have any ORDER BY clause:

SELECT a.addressid
FROM address a, addresstype t
WHERE a.locationid = :locationid
AND a.typeid = t.typeid
AND t.typecode = 'STREET';

The plan currently being picked by the CBO (Oracle 9.2.0.8) is:

NESTED LOOPS
   TABLE ACCESS BY ROWID addresstype
      INDEX RANGE SCAN pkx_addresstype
   INDEX RANGE SCAN ix_address

For background, here are the relevant schema details:

TABLE address
(addressid NUMBER PRIMARY KEY
,locationid NUMBER NOT NULL
,typeid NUMBER NOT NULL --FOREIGN KEY to addresstype
);

INDEX idx_address (locationid, typeid, addressid);

TABLE addresstype
(typeid NUMBER PRIMARY KEY
,typecode VARCHAR2(30) NOT NULL
);

typecode is actually unique, but no constraint has been declared to the database.

Because the first step in the plan (the TABLE ACCESS on addresstype) only ever gets one row (in practice), the index range scan on ix_address causes this query to return the rows in ascending order of addressid.

The process that runs this query gets rows one at a time, does some work to “check” it against some other data, and once it finds a “match”, it stops getting more rows.

This design probably seemed like a good idea at the time, but now the business would like this process changed so that in the (rare) occurrence when more than one row would “match”, the process should favour a more recent address rather than an old one. For our purposes, “recent” may be approximated by assuming that bigger addressids are more recent than smaller addressids.

We can’t change the code without a lengthy and costly call to the application vendor; it would be better if we can change it ourselves. We cannot change the PL/SQL package in which this query is defined; so what can we do? Can we change the order in which rows are returned from this query?

The answer is yes – firstly, the query has no ORDER BY, so the order of rows is affected only by the query plan. If we change the query plan, which we can do with a stored outline, the order of rows can be changed.

So in this case, all we have to do is create a stored outline to add an INDEX_DESC hint:

SELECT /*+INDEX_DESC(a)*/ a.addressid
FROM address a, addresstype t
WHERE a.locationid = :locationid
AND a.typeid = t.typeid
AND t.typecode = 'STREET';

This causes the following plan to be used:

NESTED LOOPS
   TABLE ACCESS BY ROWID addresstype
      INDEX RANGE SCAN pkx_addresstype
   INDEX RANGE SCAN DESCENDING ix_address

With the stored outline, the query now returns the rows in descending order of addressid.

NOTE:
If you want to use this approach, it is highly dependant on being able to find a query plan that produces the ordering that you want. This may mean creating a suitable index designed strategically for this purpose, if one doesn’t exist already. Even creating indexes won’t guarantee that you’ll be able to find a plan that gets the order that you want; generally I’d say this approach will only be feasible for relatively simple queries.


Create Excel file from SQL query using Python

This will create a file called “emp.xls” with a nicely formatted spreadsheet based on the query specified. The header row is based on the column names, and is frozen. The column widths are set according to the max width of the data.

import cx_Oracle
from table2Excel import write_cursor_to_excel

orcl = cx_Oracle.connect('scott/tiger')
curs = orcl.cursor()
curs.execute("""
  SELECT e.ename "Employee",
         e.job "Job",
         e.hiredate "Hire Date",
         e.sal "Salary",
         e.comm "Commission",
         d.dname "Department",
         (SELECT ename FROM emp WHERE emp.empno = e.mgr) "Manager"
  FROM   emp e, dept d
  WHERE  e.deptno = d.deptno
""")
write_cursor_to_excel(curs, 'emp.xls', 'Employees')

You can get the source for table2Excel from here [table2Excel.py.txt]. Feel free to adapt it to your needs.




Add colours to your Shuttle item


I wanted to allow users to select one or more colours from a list, and to control the order of the colours, so I’ve used a Shuttle item.

I wanted to have different background colours for each value in the list, so I started here for help. I’m not very strong with javascript (yet) but with a bit of looking around and playing I ended up with what I wanted.

I created an item on the page with the following attributes:

Display As: Shuttle

List of values definition: STATIC2:Yellow;#FFFF00, Green;#00FF00, Turquoise;#00FFFF, Pink;#FF90FF, Blue;#9090FF, Purple;#FF00FF, Red;#FF9090

Post Element Text:
<script type="text/javascript">
(function() {
for (i=0;i<$x("#CURRENT_ITEM_NAME#_2").length;i++) {
$x("#CURRENT_ITEM_NAME#_2")[i].style.backgroundColor
= $x("#CURRENT_ITEM_NAME#_2")[i].value;
}
for (i=0;i<$x("#CURRENT_ITEM_NAME#").length;i++) {
$x("#CURRENT_ITEM_NAME#")[i].style.backgroundColor
= $x("#CURRENT_ITEM_NAME#")[i].value;
}

})();
</script>

Notes:

  • the value of each item in the list is a HTML colour code. This colour code is used to set the background colour of the item in the list.
  • the shuttle item actually involves two select lists in the generated page. If the item name is P1_SHUTTLE, the generated items will be P1_SHUTTLE_2 (the left-hand list) and P1_SHUTTLE (the right-hand list). These are referenced in the javascript via #CURRENT_ITEM_NAME# and #CURRENT_ITEM_NAME#_2.
  • the $x returns the select list dom object, which supports the “length” attribute – this returns the count of items in the list
  • the select list index starts at 0 and goes up to length-1

A small problem is when the “reset” button is clicked the colours disappear. They reappear if the page is refreshed, however.


Prefix searches, Wildcard characters and LIKE

Want to run a prefix search using the LIKE operator, but your search pattern includes wildcards like _ or %? A common example is searching the v$parameter view for parameters starting with the underscore (_) character. So, why doesn’t this work:

SELECT * FROM v$parameter WHERE name LIKE '_%';

This returns ALL the rows in the view, because the “_” in the pattern matches any single character. There’s all sorts of ways to get around this:

Option 1. Use TRANSLATE to modify the underscores to some other character

SELECT * FROM my_table
WHERE TRANSLATE(my_column, '_', '#') LIKE '#%';

No good if you’re after a general solution, e.g. what if the original string already has some “#”‘s in it? Also, even if the table has an ordinary index on the column the query cannot use it.

Option 2. Use SUBSTR so that we can use an equality condition instead of LIKE

SELECT * FROM my_table
WHERE SUBSTR(my_column,1,1) = '_';

Easy, simple. Except that an ordinary index on the column will still not be used (of course, an appropriate function-based index could be used).

Option 3. Use a regular expression

SELECT * FROM my_table
WHERE REGEXP_LIKE(my_column, '^_');

or

SELECT * FROM my_table
WHERE REGEXP_INSTR(my_column, '^_') > 0

Rather simple, once you know how to write regular expressions. Still can’t use an ordinary index, though (although, again, a (very specific and otherwise generally useless) function-based index could be used).

Option 4. Read the manual and learn something about the LIKE operator

SELECT * FROM my_table
WHERE my_column LIKE '\_%' ESCAPE '\';

Easy, simple; and if there’s an index on the column, the optimiser may use it. No need for any silly function-based indexes or new-fangled regular expressions.


Add a “Who’s Online Now” box to your APEX app

Something to file under “pointless fun”…

Allow your users to feel like they’re part of a community by letting them know who else is using the app at the same time.

  1. Open APEX Application Builder, and open the page you want to add this to.
  2. Click the Create Region icon.
  3. Choose Report, then SQL Report.
  4. Enter a title, e.g. “Who’s Online Now”. Click Next.
  5. Copy the following for the SQL Query (sorry about the poor formatting):
    SELECT apex_user || ' (' || minutes_ago
           || DECODE(minutes_ago,1,' min ago)',' mins ago)')
    FROM (
      SELECT DISTINCT
             apex_user,
             FIRST_VALUE(TRUNC(seconds_ago/60))
             OVER (PARTITION BY apex_user ORDER BY seconds_ago)
             AS minutes_ago
      FROM apex_workspace_activity_log
      WHERE application_id = :APP_ID
      AND seconds_ago < 3600
      AND apex_user NOT IN (:APP_USER, 'nobody')
      ORDER BY 2);
    
  6. Click Next. Click Create Region.

You can customise the region as much as you like. For example, I use:

  • Template (under Region Definition) = “Sidebar Region”
  • Report Template (under Report Attributes) = “template: 16. One Column Unordered List”
  • Pagination Scheme = “- No Pagination Selected -“
  • Headings Type = “None”

The query is based on the APEX view apex_workspace_activity_log which is supplied with the product. It looks at all session activity within the last hour by users other than the current user, and returns a list showing their most recent activity.

The only slight problem is that it doesn’t detect when someone logs out – they’ll stay in the report for up to an hour.


LIKE with wildcard (%) at start CAN use an index

A tester for our client raised a problem where searches like ‘ABC%’ would perform satisfactorily, but searches like ‘%ABC’ would time out. Of course, the reason is that the index on the column cannot be used if a wildcard is at the start of the string, so a full table scan is always performed.

I think we’re going to accept the slow response for now, but a possible solution that we might consider in future is to use a function-based index on the REVERSE() function. Its drawback is that it requires modifying the code.

First, some test data:

CREATE TABLE jka AS
SELECT ROWNUM id, dbms_random.string('x',10) v
FROM dual CONNECT BY LEVEL <= 10000;

Now, create an ordinary index:

CREATE INDEX jka_normal ON jka (v);

BEGIN
  dbms_stats.gather_table_stats
    (USER
    ,'JKA'
    ,estimate_percent=>100
    ,cascade=>TRUE);
END;

This query can use a range scan on jka_normal:

SELECT * FROM jka WHERE v LIKE 'ABC%';

But this query will use a full table scan (can’t use the index):

SELECT * FROM jka WHERE v LIKE '%ABC';

Now, create a function-based index (not to be confused with a REVERSE INDEX):

CREATE INDEX jka_reverse ON jka(REVERSE(v));

BEGIN
  dbms_stats.gather_table_stats
    (USER
    ,'JKA'
    ,estimate_percent=>100
    ,cascade=>TRUE);
END;

This query can use a range scan on jka_reverse:

SELECT * FROM jka WHERE REVERSE(v) LIKE REVERSE('%ABC');

Edit: looks like Richard Foote beat me to it, in a discussion of reverse indexes (which unfortunately don’t contribute anything useful to this particular problem) – I did a quick search on Google without results, but Foote’s article must not have been indexed yet or too far down the list.