Google indexes duplicate pages from my APEX site – problem solved

Problem: when Google indexes my APEX web site, it considers the following URLs to be different pages:

http://www.site.com/apex/f?p=100:1:1234567890::::
http://www.site.com/apex/f?p=100:1:0::::
http://www.site.com/apex/f?p=MYAPP:1:46346346346::::
http://www.site.com/apex/f?p=MYAPP:1:34634634636::::
http://www.site.com/apex/f?p=MYAPP:HOME:46346346346::::
http://www.site.com/apex/f?p=MYAPP:HOME:0::::

Notice how my application with ID 100 has an alias of MYAPP, and page 1 has an alias of HOME; also, more duplicates happen due to the session ID; all these URLs point to pretty much the same content, but Google indexes them all as separate pages.

Google provides two features that help webmasters solve the duplicate page problem.

Solution #1: Parameter Handling – not very useful (for us)

This solution involves telling Google which parameters to ignore when indexing URLs. This doesn’t help us with APEX, because apex only uses one parameter – “p”; if we were to tell Google to ignore the “p” parameter it would consider ALL pages in our site to be identical, which is not correct.

Solution #2: Specify Your Canonical – very useful!

Example:

<head>
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />
</head>

This works nicely for us – for any page that we want we can tell Google what URL should be the “canonical” or “official” URL for that page. We can use this in our APEX applications in a number of ways. Each has advantages and disadvantages and YMMV, and it depends on how many different kinds of pages you have and whether you want the same canonical form for all pages, or if you want it customised for individual pages.

A. Custom canonical URL for each page.

This option will probably be the most generally useful, since some pages (e.g. multi-row paged results) won’t work so well with a canonical URL, so you’ll want to specify a canonical URL for just some key pages on your site.

To do this, go to the Page editor and edit the Page Attributes, edit the HTML Header and add the following:

<link rel="canonical" href="/apex/f?p=&APP_ID.:&APP_PAGE_ID.:0"/>

  • You can add the full URL instead of a relative one if you want, but note if you do that it must be on the same domain (e.g. if your site is www.mysite.com, you can’t have a canonical URL pointing to myothersite.com). Anyway, Google don’t mind if you use relative URLs here, so that’s what I do.
  • You don’t have to use the &APP_xxx. substitution variables if you don’t want to – e.g. you could specify another application or page entirely if that makes sense for your app.
  • If your application has an alias, you could use that as the canonical URL:

    <link rel="canonical" href="/apex/f?p=&APP_ALIAS.:&APP_PAGE_ID.:0"/>
    Unfortunately, if APEX has a substitution variable for the Page Alias, I don’t know what it is. UPDATE 2017: APEX now provides the substitution variable APP_PAGE_ALIAS as of APEX 5.0.

B. Global canonical URL for all pages in an application.

This option works well if you want all the pages to have the same form of canonical URL. Because we’ll use the &APP_PAGE_ID. substitution variable, it will still correctly give the correct URL for each page in the application.

To do this, go to the Shared Components, and open Themes. Open the theme in use by your application, then find the Page themes. Next to each Page Theme is a number that indicates how many pages use that Page Theme; those are the only ones you need to edit (although there’s nothing stopping you from editing all of them if you wish).

Click the Page Theme name to edit it. In the Header definition, add the canonical link – it must be inserted after the <head> tag, and prior to the </head> tag. For example:

<html lang="&BROWSER_LANGUAGE." xmlns:htmldb="http://htmldb.oracle.com">
<head>
<title>#TITLE#</title>
<link rel="stylesheet" href="#IMAGE_PREFIX#themes/theme_16/theme_V3.css" type="text/css" />
<!--[if IE]><link rel="stylesheet" href="#IMAGE_PREFIX#themes/theme_16/ie.css" type="text/css" /><![endif]-->
<link rel="canonical" href="/apex/f?p=&APP_ALIAS.:&APP_PAGE_ID.:0"/>
#HEAD#
</head>
<body #ONLOAD#>#FORM_OPEN#

Now, it’s important to test your changes thoroughly because many syntax errors you enter will not manifest in any obvious problems when browsing the site. Open your pages and View Source – check that the header section of the HTML includes the correct <link rel="canonical" ...> tag, and ensure that the URL resolves to the same page by copying it out and pasting it into your brower’s address bar.

Once that’s done that’s it! When Google next indexes your site it should honour your canonical URLs and remove duplicate pages from its indexes.


AUSOUG 2009

Finally the AUSOUG 2009 conference program is out!

Looks good – I’m looking forward to a number of sessions (10-11 Nov in Perth), including the new “ODTUG Oracle ACE” stream:

  • Howard OngOracle Designer RIP? – A First Look at Oracle Data Modeler
  • Penny Cookson & Ray Tindall – Developing a Really Cute Application Using Ajax
  • Scott WesleyTrials & Tribulations of an Oracle Forms -> Apex Conversion
  • Connor McDonald SQL-ectric!
  • Tim HallPL/SQL: It’s all in the Presentation

I’ll be speaking again, this time doing an “Intro to Python” for Oracle’rs. If you’ve never played with this wonderful language before I hope you’ll come along. Alternatively, if you’re keen to hear Penny talk about “Best Practices in Apex Development”, or Scott Hollows give his “History of Oracle – the early years”, I hope you’ll tell me all about it afterwards, because my talk is scheduled at the same time.

If you’re attending the Perth conference, say hooroo.