Truncated Mean in Oracle
A colleague needed to get the average from a set of data, but disregarding the top 25% and the bottom 25%. We didn’t know of any builtin Oracle function that would do this, but a review of the wikipedia page for Average yielded the technical term: truncated (or trimmed) mean. So we searched the Oracle docs and Google for ways to implement this function and didn’t come up with anything very useful. There were some SQL Server scripts which would have required two or three passes over the dataset.
After browsing through the aggregate functions documentation, I hit upon the NTILE function which I’ve used before, and realised that was the answer. The NTILE function takes a set of ordered data and divides it evenly (or as evenly as possible) between a number of buckets. In our case, we wanted to discard the top 25% and bottom 25%, so we simply divide the set into 4 buckets and discard the 1st and the 4th buckets; then take the standard average of the remainder:
SELECT AVG(mystat) FROM ( SELECT mystat, NTILE(4) OVER (ORDER BY mystat) n FROM (SELECT mystat FROM mytable) ) WHERE n IN (2,3);
The benefit of this query is that it only does one pass over the data, and was easily modified to partition the data set into groups; a count was also added so that the average could be taken over the entire data set for any groups that had less than 4 items.
To get the truncated mean in groups, except for groups with <4 items (for which we’ll report the average over the entire group):
SELECT mycat, AVG(mystat) FROM ( SELECT mycat, mystat, NTILE(4) OVER (PARTITION BY mycat ORDER BY mystat) n, COUNT(1) OVER (PARTITION BY mycat) c FROM (SELECT mycat, mystat FROM mytable) ) WHERE n IN (2,3) OR c < 4 GROUP BY mycat ORDER BY mycat;
Hugo
10 May 2022 - 2:03 am
Great article. Found it very useful. Thank you