OracleOverflow?

I’ve enjoyed reading and participating in StackOverflow for over a year now.

With the introduction of Area51, there is now the possibility of starting a new Q&A site for all things Oracle – as pointed out by Rob Van Wijk and Gary Myers. Sure there are other fora such as OTN, but they don’t have the kinds of features that make SO fun and self-moderating. SO is more like a cross between a Q&A forum and a wiki, with the addition of a democratic system of reputation points that allow good questions and answers to bubble up to the top.

If you agree, please head over to Area 51 – Oracle Databases and Follow it to voice your agreement.


Two sides of the same fence?

In my daily reading these two articles came fairly close together, and I have a strong feeling they are describing exactly the same project. *

From one side of the fence: A total rewrite: costly, time-consuming, but worth it?

“Our CMS was developed using Active Server Pages, and consisted of around 80,000 lines of VBScript code.”

“as we continued to develop the rest of the framework, we saw that it took a lot longer than anticipated”

From the other side: The Homegrown CMS

“If one were to create a list of adjectives to describe this monstrosity, “stable,” “reliable,” and “accurate” would be strikingly absent.”

“…nearly 200 tables (mostly imported from Microsoft Access) and not a single stored procedure. Boolean-type values were represented as a CHAR(5) field, holding possible values of ‘true’, ‘false’, or NULL…

“There were also no indexes. Nor any primary key columns… For inserts, an ID was assigned via in-line SQL by requesting the Max(ID) field of any particular table and adding 1.”

My point of view: this sounds like a good counter-example to the (very good, but general) advice offered by Joel Spolsky: don’t rewrite from scratch.

* EDIT: Vidar Langberget has advised that these are not the same project, proving that I’m not such a great prognosticator 😐




AUSOUG Conference 2009 Day One

It was great to catch up with former colleagues at the conference today. As I go from site to site I’m gradually collecting more and more former colleagues and it’s great to hear how they’re going and what they’re working on.

Howard Ong gave a good review of Oracle’s new Data Modeller. Sounds like it’s got a lot of promise – but it’s very new and misses some features (e.g. support for functions, procedures with parameters, packages, and DDL change script generation) that I’d expect in an Oracle CASE tool.

Ray Tindall with Penny Cookson showed how a variety of javascript libraries (Apexlib, ExtJS, JQuery) can be integrated into APEX, and what features they provide. Personally I’d prefer to wait until they come out of the box in APEX 4.0, but for now the options available look good.

I was happy to see a good level of interest in Python at my talk after lunch. If you’re here because you want to have a peek at all the slides I skipped, you can view or download it from here [Python_and_Oracle_Fast.pdf].

Unfortunately I had to leave soon after that to pick up my car from the autoelectrician, tomorrow I should be able to stay all day.


AUSOUG 2009

Finally the AUSOUG 2009 conference program is out!

Looks good – I’m looking forward to a number of sessions (10-11 Nov in Perth), including the new “ODTUG Oracle ACE” stream:

I’ll be speaking again, this time doing an “Intro to Python” for Oracle’rs. If you’ve never played with this wonderful language before I hope you’ll come along. Alternatively, if you’re keen to hear Penny talk about “Best Practices in Apex Development”, or Scott Hollows give his “History of Oracle – the early years”, I hope you’ll tell me all about it afterwards, because my talk is scheduled at the same time.

If you’re attending the Perth conference, say hooroo.


How to tell if someone’s a programmer

“A woman asks her husband, a programmer, to go shopping:
– Dear, please, go to the nearby grocery store to buy some bread. If they have eggs, buy 6.
– O.K., hun.
Twenty minutes later the husband comes back bringing 6 loaves of bread. His wife is flabbergasted:
– Dear, why on earth did you buy 6 loaves of bread?
– They had eggs.”

Source


Not just a free lunch

When I first learned to drive, all our cars were automatics – and so that’s all I learned to drive. This never posed an issue to me until I married Rosalie who owns a manual. She tried to teach me, and I managed – kind of. It wasn’t until I had a chat with my uncle-in-law, who’s a mechanic, that it “clicked” with me. This is because he explained to me basically how the gear-change system works, which meant I now had a picture in my head of what was going on “under the hood” when I pushed the clutch in – it didn’t take long before I was smoothly changing gears and not making my passengers lurch forward and back like someone on a bucking bull.

I think that’s the way I learn things – I need to have a mental model of how a device is working internally before I can really use it effectively. To be useful the model doesn’t have to be greatly detailed or even perfectly accurate – it just has to be good enough that I can predict or guess what effect different actions will have on the equipment.

Today I enjoyed Richard Foote’s talk debunking a raft of common index-related myths today, followed by Penny Cookson sharing about bind peeking, and Connor McDonald giving some creative demos of RAC.

There was some material covered in part from recent articles on Richard’s blog but he explained clearly the mechanics of Oracle’s B-Tree and bitmap indexes, including how index statistics affect query plans. One highlight for me was the clear explanation of how the clustering factor (CF) is calculated, what it means and how to use it to diagnose why an index is not being used. I think I understand the CF much better than I did before because Richard explained it in a way similar to the way my uncle-in-law explained the clutch in a car – i.e. he explained how Oracle calculates it and uses it internally. He could have just said “a good CF is one that is close to the number of blocks, and a bad CF is one that is close to the number of rows”, but instead he explained how it works, which means this rule-of-thumb becomes more self-evident, and a whole lot more meaningful and useful.

Other highlights:

  • the mere existence of an index can change query plans dramatically, even if the index is not actually used by the query;
  • adding a ‘ ‘ (a string containing just a space) as the second column in an index on a nullable column means that all the nulls will be stored in the index; and
  • he’s a big fan of David Bowie (who knew?)

Penny gave a quick overview of the problems of bind variable peeking, including a good explanation of why many 8i-to-9i upgrade projects experienced initial performance problems. There were some excellent tips for those of us not yet blessed with 11g on how to resolve (or at least work around) the problem of sql plans lurking in the shared pool, intermittently messing with query performance.

When we returned from afternoon tea for Connor’s talk, confronted with a table ominously equipped with ropes, children’s toys, toilet paper and lollies, we were treated to a talk which, by Connor’s standards, was a raving endorsement of RAC – by which I mean he did allow that it could very well be of benefit at some Oracle shops. Certainly the FUD surrounding RAC seems to have been cleared a bit.


Job opportunity for APEX developer

Interesting to see openings specifically for Apex developers in Perth. For example:

Oracle Developer – APEX (Applications Express)
Location: Perth
Advertiser: Vantage Recruitment
Classification: I.T. & T > Database Dev. & Admin
Description: ***Contract Opportunity for an Exciting Oracle Development Project in Perth CBD***
View this job at:
http://seek.com.au/users/apply/index.ascx?JobID=15083974&cid=jobmail

(Note: I’m not affiliated with this advertiser nor do I know anything about this ad apart from what you can read for yourself from the link above.)


Bias in Testing

I’ve been trying a number of strategies to improve the performance of a very complex form (Oracle Forms 6i) currently in development. We’ve already done a fair amount of work making the code as efficient as possible, while still being reasonably maintainable, so there don’t seem to be any more low-hanging fruit we can pick off easily.

One aspect of the performance is the form startup time, which is on the order of 9-17 seconds. The kinds of changes I’m testing don’t make a very visible change to this time, so to work out whether the changes are worthwhile I have to be a bit more scientific about performance measurement. That means I have to use statistics (Disclaimer: I am not a statistician, so take what I say with a grain of salt and do your own research!). I test the form with the original code and with the modified code to see whether a particular code change makes any difference to the performance.

I start the form up, wait for it to load, then run our debug tool which tells me the times (to the nearest second) when the form started and when control was passed to the user. I then record these times in a spreadsheet, then close the form and try again two or three times, recording each result. Then I take the average and write that down. I’ll then change back to the original (unimproved) code, run the whole test again, then compare the average results. Sometimes the average is better, worse or the same as the original.

This is what I was doing today – I’d made a change which I thought might knock off a second or two off the startup time, and so I ran my 2-phase timing test:

Improved code timings: 19s, 8s, 7s, 7s
Original code timings: 9s, 8s, 8s

What can I conclude from these data? What I did conclude was that the first timing (19s) was an outlier – it was the first time I’d run the form today so it probably had to load various caches and possibly even perform hard-parses for some of the queries. The original code had an average startup time of 8.3 seconds, and the improved code loaded in 7.3 seconds on average (discounting the outlier). My improvement saved 1 second!

Or did it? These tests were performed between 8:16 and 8:27 on a Tuesday morning. I later thought of a slightly different way of doing the code improvement, implemented it and ran the whole test again:

Improved(#2) code timings: 11s, 9s, 12s, 9s
Original code timings: 11s, 12s, 11s, 7s, 8s

The averages have risen to 10.3 and 9.8 seconds, and the “improved” code actually looks worse here! This second battery of tests were performed between 9:13 and 9:22 on the same morning. You’ll also notice that I ran the test more often this time. Why? Well, I couldn’t believe that the improved code slowed the form to an average 10.6 seconds, so ran it a fourth time; and due to my increasing suspicion that these timings were more random than I’d realised, I ran the final test five times. In the end I decided the slower times must be because more and more developers have arrived by 9am and so the app server is busier, throwing my test results into confusion.

One thing I’ll confess is that I’ve had to resist a temptation all along to only include timings that seem “reasonable” – e.g. if the form took 19 seconds to load I’d immediately suspect some other process on the server had slowed my session down, and so I’d be tempted to not include it in the results spreadsheet. I knew enough about statistics to know that doing this would run the risk of introducing confirmation bias into the results, so I diligently recorded every result I observed.

After considering the apparent failure of my attempt to prove my code change made a discernable difference to the performance of the code, I started thinking a bit more about what was going on as I ran the tests and recorded the results. I realised that a more subtle form of confirmation bias has crept into my results because I didn’t decide firsthand how many tests I would run. I simply ran the test three or our times, depending on whether I was in a hurry, and if it didn’t seem to be coming up with the numbers I was expecting, I’d keep running it until it did!

Writing it out like that, it seems blatently obvious, but when you’re in the middle of running these tests and recording results it’s very easy to slip up.

What did I learn from all this? Before running any tests, write down exactly how many tests will be run, and at what times of the day. In other words, try to eliminate irrelevant variables such as concurrent activity on the server by spreading the tests randomly throughout the day, and try to avoid confirmation bias by predicting ahead of time how many tests will be needed to reduce the impact of outliers on the average result.