Google indexes duplicate pages from my APEX site – problem solved

Problem: when Google indexes my APEX web site, it considers the following URLs to be different pages:

http://www.site.com/apex/f?p=100:1:1234567890::::
http://www.site.com/apex/f?p=100:1:0::::
http://www.site.com/apex/f?p=MYAPP:1:46346346346::::
http://www.site.com/apex/f?p=MYAPP:1:34634634636::::
http://www.site.com/apex/f?p=MYAPP:HOME:46346346346::::
http://www.site.com/apex/f?p=MYAPP:HOME:0::::

Notice how my application with ID 100 has an alias of MYAPP, and page 1 has an alias of HOME; also, more duplicates happen due to the session ID; all these URLs point to pretty much the same content, but Google indexes them all as separate pages.

Google provides two features that help webmasters solve the duplicate page problem.

Solution #1: Parameter Handling – not very useful (for us)

This solution involves telling Google which parameters to ignore when indexing URLs. This doesn’t help us with APEX, because apex only uses one parameter – “p”; if we were to tell Google to ignore the “p” parameter it would consider ALL pages in our site to be identical, which is not correct.

Solution #2: Specify Your Canonical – very useful!

Example:

<head>
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />
</head>

This works nicely for us – for any page that we want we can tell Google what URL should be the “canonical” or “official” URL for that page. We can use this in our APEX applications in a number of ways. Each has advantages and disadvantages and YMMV, and it depends on how many different kinds of pages you have and whether you want the same canonical form for all pages, or if you want it customised for individual pages.

A. Custom canonical URL for each page.

This option will probably be the most generally useful, since some pages (e.g. multi-row paged results) won’t work so well with a canonical URL, so you’ll want to specify a canonical URL for just some key pages on your site.

To do this, go to the Page editor and edit the Page Attributes, edit the HTML Header and add the following:

<link rel="canonical" href="/apex/f?p=&APP_ID.:&APP_PAGE_ID.:0"/>

  • You can add the full URL instead of a relative one if you want, but note if you do that it must be on the same domain (e.g. if your site is www.mysite.com, you can’t have a canonical URL pointing to myothersite.com). Anyway, Google don’t mind if you use relative URLs here, so that’s what I do.
  • You don’t have to use the &APP_xxx. substitution variables if you don’t want to – e.g. you could specify another application or page entirely if that makes sense for your app.
  • If your application has an alias, you could use that as the canonical URL:

    <link rel="canonical" href="/apex/f?p=&APP_ALIAS.:&APP_PAGE_ID.:0"/>
    Unfortunately, if APEX has a substitution variable for the Page Alias, I don’t know what it is. UPDATE 2017: APEX now provides the substitution variable APP_PAGE_ALIAS as of APEX 5.0.

B. Global canonical URL for all pages in an application.

This option works well if you want all the pages to have the same form of canonical URL. Because we’ll use the &APP_PAGE_ID. substitution variable, it will still correctly give the correct URL for each page in the application.

To do this, go to the Shared Components, and open Themes. Open the theme in use by your application, then find the Page themes. Next to each Page Theme is a number that indicates how many pages use that Page Theme; those are the only ones you need to edit (although there’s nothing stopping you from editing all of them if you wish).

Click the Page Theme name to edit it. In the Header definition, add the canonical link – it must be inserted after the <head> tag, and prior to the </head> tag. For example:

<html lang="&BROWSER_LANGUAGE." xmlns:htmldb="http://htmldb.oracle.com">
<head>
<title>#TITLE#</title>
<link rel="stylesheet" href="#IMAGE_PREFIX#themes/theme_16/theme_V3.css" type="text/css" />
<!--[if IE]><link rel="stylesheet" href="#IMAGE_PREFIX#themes/theme_16/ie.css" type="text/css" /><![endif]-->
<link rel="canonical" href="/apex/f?p=&APP_ALIAS.:&APP_PAGE_ID.:0"/>
#HEAD#
</head>
<body #ONLOAD#>#FORM_OPEN#

Now, it’s important to test your changes thoroughly because many syntax errors you enter will not manifest in any obvious problems when browsing the site. Open your pages and View Source – check that the header section of the HTML includes the correct <link rel="canonical" ...> tag, and ensure that the URL resolves to the same page by copying it out and pasting it into your brower’s address bar.

Once that’s done that’s it! When Google next indexes your site it should honour your canonical URLs and remove duplicate pages from its indexes.


FRM-40654 “Record has been updated by another user”

There are several reasons you might get this error in Oracle Forms, e.g. another user has modified the record before you saved it (as the error message suggests), or a table trigger has modified the record and your form’s DML Returning Value is set to No.

Another cause to chalk up is what one of my colleagues got today. He made a form with a simple table-based block, he would insert a record and save successfully, then try to modify it and consistently got FRM-40654 when he tried to save. No other user was trying to update the row, and there were no triggers on the table.

The answer? The table is an index-organised table, but the block on the form had Key Mode set to Automatic. I suspect the form is comparing the ROWID of the before-and-after change, but because the table is index-organised the ROWID is not necessarily constant. Whatever the reason, changing the Key Mode to Updateable or Non-Updateable solves the problem.


WITH With an IN; or, A Reason to Refactor

A work colleague needed to make a change to a report and came up against a brick wall. He knew what he wanted to express in SQL, but Oracle wouldn’t accept his syntax.

The original query used a WITH clause like this (note, I’ve removed a great deal of irrelevant detail here, the actual query had a lot of other stuff going on, but this will do to illustrate this point):

WITH q AS
(SELECT expensive_function1(:id) idpart1
,expensive_function2(:id) idpart2
FROM DUAL)
SELECT t.*
FROM q, some_table t
WHERE q.idpart1 = t.idpart1
AND q.idpart2 = t.idpart2;

This works well because the two expensive functions are run only once in the query, and the results are used to probe some_table on its compound key.

The requirements had changed, however, and my colleague needed to change it so that instead of calling the expensive functions, it now got multiple keys from another table. His first cut looked like this:

SELECT t.*
FROM some_table t
WHERE (t.idpart1, t.idpart2) IN
(SELECT idpart1, idpart2
FROM driving_table
WHERE id = :id);

This worked fine, but my colleague felt that the WITH clause should be retained, e.g.:

WITH q AS
(SELECT idpart1, idpart2
FROM driving_table
WHERE id = :id)
SELECT t.*
FROM some_table t
WHERE (t.idpart1, t.idpart2) IN q;

However, Oracle doesn’t like this and raises ORA-00920: invalid relational operator when the statement is parsed.

To my mind this alternative formulation was not required, but I was interested to know what this error message meant, and why can’t we use a WITH subquery as the target for an IN statement?

As it is, however, the Oracle documentation explains why this formulation is not allowed – look up the “SELECT” command in the SQL Reference 10g Release 2 (10.2), search for “WITH”, and under “subquery_factoring_clause” we read:

“Restrictions on Subquery Factoring: This clause is subject to the following restrictions:

In a compound query with set operators, you cannot use the query_name for any of the component queries, but you can use the query_name in the FROM clause of any of the component queries.”

In other words, we can refer to a WITH subquery in a FROM clause but not as part of a set operator like IN.

In the case of my colleague, I recommended he use this much simpler formulation, which works just fine for this report:

SELECT t.*
FROM some_table t, driving_table d
WHERE t.idpart1 = q.idpart1
AND t.idpart2 = q.idpart2;

This shows how, when maintaining existing code, sometimes you have to go back to basics and consider whether the change to the requirements (in this case, a seemingly minor change) means that the code can be refactored to give an appreciable benefit. Caveat: this is not always the case: making large changes to existing code always carries the risk of introducing more defects.


My APEX application asks users to log in twice

I had this problem with an APEX application I’m building, and finally found the cause this morning, so I thought I’d share it.

This particular application has some pages which are only available to authenticated users, and some pages which are visible to everyone. One nice thing about APEX is that it automatically redirects users to the Login screen if they try to navigate to a protected page.

After authentication, the user doesn’t have to login again – they can now see all pages of the application that I want them to see. This used to work fine.

Recently I noticed that sometimes I’d Login with my username and password, click on a Tab, and it would ask me to Login again. In these instances, it’d only ask me to Login just the second time – after that, it would be fine. I wrote it off as a random glitch on my home-grown server. It seemed to be random, and after a while I noticed it was happening once every day. I looked all through my application, trying to find any links that didn’t pass the &SESSION. through, but I couldn’t find any such problems. I looked at some other applications on the same server – no problems there, it was just this one application.

Just this morning I went in, and happened to notice something not quite right. Normally, when I go into an application, the URL looks something like this:

http://www.xyz.com/apex/f?p=100:1:318727495645403::NO

The site should generate the long numeric Session ID automatically. However, I noticed my URL looked like this:

http://www.xyz.com/apex/f?p=100:1:0::NO

The Session ID was zero. This is a relatively new feature of APEX which I use for my fully-public applications (i.e. ones which require no authentication), where no Session ID is required – it means users can bookmark individual pages without having a long Session ID embedded in the URL.

The cause? When I updated my index page of APEX applications, I copied another entry without thinking, and so included the “0” for the Session ID. So when I first logged in, it gave me a new session, but somewhere internally APEX still had my Session ID = 0, requiring me to Login again. After this, the internal reference to my session was updated. I don’t know if this is expected behaviour or a bug in APEX.

The fix? Remove the 0 from the initial link (e.g. now it looks like “http://www.xyz.com/apex/f?p=100:1”) – zero session IDs are only appropriate for applications that require no login at all anyway.


ORA-06502: PL/SQL: numeric or value error: Bulk bind: Error in define

I came across an inexplicable error when bulk collecting into a PL/SQL table with the NOT NULL constraint the other day. What was confusing was that the code had been passing tests for quite some time.

In the end the only thing that had changed was that a VARCHAR2 which should have been non-null happened to be NULL for one particular row in the table.

Thanks to Connor for the simple test case, listed below.

If you know what might be the cause or reason behind this error, and why it doesn’t occur for dates, I’d be interested.

This was reproduced on Oracle 10.2.0.1.0.

SQL> declare
      type t is table of number not null index by pls_integer;
      r t;
     begin
      select case when rownum < 20 then rownum else null end
      bulk collect into r from all_Objects
      where rownum <= 20;
     end;
     /
declare
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: Bulk bind: Error in define
ORA-06512: at line 5

SQL> declare
      type t is table of varchar2(80) not null index by pls_integer;
      r t;
     begin
      select case when rownum < 20 then rownum else null end
      bulk collect into r from all_Objects
      where rownum <= 20;
     end;
     /
declare
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: Bulk bind: Error in define
ORA-06512: at line 5

SQL> declare
      type t is table of date not null index by pls_integer;
      r t;
     begin
      select case when rownum < 20 then sysdate else null end
      bulk collect into r from all_Objects
      where rownum <= 20;
     end;
     /
PL/SQL procedure successfully completed.

Lost SQL Developer Connections

I upgraded Oracle SQL Developer from 1.0.14.67 to 1.0.15.27, just for the heck of it. Unfortunately, all my saved connections had disappeared! After a text search I found the connections are stored in a file called IDEConnections.xml under the folder sqldeveloper\jdev\system\oracle.onlinedb.11.0.0.37.25. I copied this across to a new folder that had been created (oracle.onlinedb.11.0.0.37.36) and bingo they’re back again.

Bonus – now I know what file to back up if I want to restore my connections later on.


Small SAMPLE bug in Oracle 9i (ORA-30561)

If you’re using the SAMPLE clause in Oracle 9i, in combination with a join, be aware of this small gotcha. I found a workaround, thankfully.

select * from (select * from dual), (select * from dual sample (10));

Expected result: should return zero or one row (more or less at random)

Actual result (tested with 9.2.0.6.0): “ORA-30561: SAMPLE option not allowed in statement with multiple table references”

Workaround: put the query with the SAMPLE clause first, i.e.

select * from (select * from dual sample (10)), (select * from dual);

Note: It works fine in 10g (tested under 10.1.0.2.0) since 10g does not restrict how many SAMPLE clauses are in a query.


User-named locks with DBMS_LOCK

The application I’m working on involves a number of views that provide an interface between two databases that have quite different models. The views implement inserts, updates and deletes via Instead-of triggers.

On one of these views, the update trigger needs to actually do an update and an insert on the source table: it must update the existing row to mark it as “closed”, and then insert a new row to take its place. In this way the view hides the fact that the table keeps a history of changes.

Now, a logical constraint is that there can only be one “open” record for a given entity. Therefore, there can be zero or more “closed” records, and there must be a single “open” record. This constraint, however, has not been implemented on the database (naughty naughty) because our code is perfect and never makes a mistake 😉

This worked perfectly well until our thorough Java guy decided to do some load testing, in which a process would hammer away inserting, updating and deleting the same row over and over again, in multiple concurrent sessions.

This had the result that we ended up seeing lots of “open” records for the entity, which violates the logical constraint described above. “That’s not allowed” I said, and soon realised that the fact that multiple sessions are updating the same row means that even a trigger-based constraint check will not solve the problem, since the triggers cannot see updates from other sessions.

So, we need to have a lock of some sort, so that when one session wants to update an entity, other sessions cannot start updating it as well until the first one is finished. I can’t just lock the row I want to update, because after I’ve updated it, I still need the lock while I insert the new row.

The solution is to use a package supplied with Oracle calls DBMS_LOCK, with which I can request “user-named locks”. This package gives a lot of options but I’m only interested in simple Exclusive locks, so I’ll also create some wrapper procedures.

Easy, right? Well, yeah once you get past a few gotchas.

Try #1:

-- wrapper function to Request a lock; returns a lock handle
FUNCTION request_lock (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  lock_status NUMBER;
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  lock_status := DBMS_LOCK.REQUEST(
    lockhandle => lock_handle,
    lockmode => DBMS_LOCK.X_MODE, -- eXclusive
    timeout => DBMS_LOCK.MAXWAIT, -- wait forever
    release_on_commit => TRUE);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'request_lock failed: ' || lock_status);
  END IF;
RETURN lock_handle;
END request_lock;

-- wrapper to release a lock; call with the lock handle obtained previously
PROCEDURE release_lock (lock_handle IN VARCHAR2) IS
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.RELEASE(
    lockhandle => lock_handle);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'release_lock failed: ' || lock_status);
  END IF;
END release_lock;

--example calling code
...
DECLARE
  lock_handle VARCHAR2(128);
BEGIN
  lock_handle := request_lock('MYLOCK1');
  -- do some stuff
  release_lock(lock_handle);
END;
...

The above code “worked” ok (although I hadn’t yet tested to see if it was actually getting the lock) until I started calling it from triggers on the views, when I started getting “ORA-04092: cannot commit in a trigger”. What the? I wasn’t doing any commits in my triggers. I couldn’t find any references to the DBMS_LOCK functions doing commits in the 10g documentation, but I thought, ok let’s make these wrappers Autonomous, so they can commit in peace:

Try #2:

FUNCTION request_lock (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  lock_status NUMBER;
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  lock_status := DBMS_LOCK.REQUEST(
    lockhandle => lock_handle,
    lockmode => DBMS_LOCK.X_MODE, -- eXclusive
    timeout => DBMS_LOCK.MAXWAIT, -- wait forever
    release_on_commit => TRUE);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'request_lock failed: ' || lock_status);
  END IF;
  RETURN lock_handle;
END request_lock;

PROCEDURE release_lock (lock_handle IN VARCHAR2) IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.RELEASE(
    lockhandle => lock_handle);
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'release_lock failed: ' || lock_status);
  END IF;
END release_lock;

No good: I got “ORA-06519: active autonomous transaction detected and rolled back”! This didn’t make much sense at the time, but I put a COMMIT in anyway. But of course, this wouldn’t work with the triggers. Anyway, I started experimenting and soon realised that even if I got the wrappers to work, the locks weren’t being made anyway, and I was able to request locks multiple times, which I expected to be impossible if the wrapper was working correctly.

Finally, I logged in as a user with some DBA privileges and read the specification of the SYS.DBMS_LOCK package. The one thing that jumped out at me was a little comment under ALLOCATE_UNIQUE:

“This routine will always do a commit.”

There was my answer. It was the ALLOCATE_UNIQUE that was doing a commit, not REQUEST or RELEASE. And, because I was setting release_on_commit => TRUE in the call to REQUEST, whenever I committed the lock was being released, straight away.

So I changed release_on_commit => FALSE, and moved ALLOCATE_UNIQUE to a separate function (set up as an Autonomous Transaction). Having read a bit more I realised I could make this API nicer to the programmers by allowing the release_lock to accept the lock name instead of requiring them to manage the lock handle. I also improved the error messages.

Try #3:

-- internal function to get a lock handle
-- (private for use by request_lock and release_lock)
FUNCTION get_handle (lock_name IN VARCHAR2) RETURN VARCHAR2 IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  lock_handle VARCHAR2(128);
BEGIN
  DBMS_LOCK.ALLOCATE_UNIQUE (
    lockname => lock_name,
    lockhandle => lock_handle,
    expiration_secs => 864000); -- 10 days
  RETURN lock_handle;
END get_handle;

PROCEDURE request_lock (lock_name IN VARCHAR2) IS
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.REQUEST(
    lockhandle => get_handle(lock_name),
    lockmode => DBMS_LOCK.X_MODE, -- eXclusive
    timeout => DBMS_LOCK.MAXWAIT, -- wait forever
    release_on_commit => FALSE);
  CASE lock_status
    WHEN 0 THEN NULL;
    WHEN 2 THEN RAISE_APPLICATION_ERROR(-20000,'deadlock detected');
    WHEN 4 THEN RAISE_APPLICATION_ERROR(-20000,'lock already obtained');
    ELSE RAISE_APPLICATION_ERROR(-20000,'request lock failed: ' || lock_status);
  END CASE;
END request_lock;

-- wrapper to release a lock
PROCEDURE release_lock (lock_name IN VARCHAR2) IS
  lock_status NUMBER;
BEGIN
  lock_status := DBMS_LOCK.RELEASE(
    lockhandle => get_handle(lock_name));
  IF lock_status > 0 THEN
    RAISE_APPLICATION_ERROR(-20000,'release lock failed: ' || lock_status);
  END IF;
END release_lock;

--example calling code
...
BEGIN
  request_lock('MYLOCK1');
  -- do some stuff
  release_lock('MYLOCK1');
END;
...

This worked perfectly, even from triggers. I even tried it out with multiple sessions:

Session 1:

exec request_lock('JEFF');

Session 2:

exec request_lock('JEFF');

At this point, session 2 just hung – it was waiting for Session 1 to release the lock.

Session 1:

exec release_lock('JEFF');

Now session 2 returned and had the lock. The same thing happened if I just closed down session 1.

If I changed request_lock to set timeout to some value (e.g. 3 seconds), session 2 would wait for that amount of time, then raise my error message “request lock failed – 1” (1 is the status code when a lock request times out).

I tried it with a deadlock situation, with timeout = MAXWAIT:

Session 1:

exec request_lock('MYLOCKA');

Session 2:

exec request_lock('MYLOCKB');

Session 1:

exec request_lock('MYLOCKB');

At this point session 1 hangs as it waits for session 2 to release the lock.

Session 2:

exec request_lock('MYLOCKA');

This is a deadlock. Both sessions are waiting for a lock held by the other session. After a few seconds Oracle automatically detects the deadlock and picks one session (at random, apparently) and it returns the lock status 2 (my code raises the error message “deadlock detected”). The other session stayed hung, however – it was still waiting, but there was no deadlock.

I tried the above scenario again, but with timeout = 5 seconds. If I executed the request_locks quick enough, I got “deadlock detected” on one session, and the other session eventually timed out.

So, everything works as expected, and the Java guys can hammer away at my views with impunity.


Instead-of-Delete trigger problem

In software development, quite often the problem you encounter has virtually nothing to do with the thing you’re trying to do.

I’m creating some views which will provide an interface to a database. The views will have Instead-of triggers so that all DML will be passed to my own (packaged) procedures.

I’ve got a particular view which is a join between a base table, filtered by some criteria, and a code/description mapping view. I’ve implemented instead-of-insert, update and delete triggers that do the appropriate DML on the base table.

Inserts and updates work fine, but when deleting from the view I’m getting “TM contention” wait events. When deleting across a database link the session gets “ORA-02049: timeout: distributed transaction waiting for lock” after a long delay. If the delete is issued within the same database as the view, the session waits indefinitely for the lock.

I had a look at the view and noticed that it is a key-preserved view, so I theorised that maybe the session was locking the row before running my instead-of-delete trigger that does the delete; the trigger tries to delete the row that has already been locked. Doesn’t make sense. When I remove the instead-of-delete trigger however, I get “ORA-01752: cannot delete from view without exactly one key-preserved table”. So I need the trigger to handle the delete.

Looked up the Oracle doc “Instance Tuning Using Performance Views” about “TM enqueue”: “The most common reason for waits on TM locks tend to involve foreign key constraints where the constrained columns are not indexed. Index the foreign key columns to avoid this problem.” I saw this ref come up in a search early on but I initially dismissed it because my problem wasn’t related to performance. I’m glad I took a second look.

Righty-o. Looked at the table I was deleting from, and lo and behold there is another table with a fk to my table with no index on the fk column (mind you, there was an index that contained that column along with another column, obviously the database wasn’t going to use that to avoid the table lock).

When I created an index on the fk column, the problem was resolved. Brilliant!

The application never needed to delete from that table before, so this had never been encountered. In this case adding the index solved the problem. It’s on a very small, rarely-used table so the index won’t have a negative impact. Took about half an hour to work out, but I learned a bit more about Oracle, so that’s good.