For a long time my disaster recovery procedure for my Amazon EC2-based web site was:
- Find an Oracle AMI that has 10g XE with Apex pre-installed, and start up an instance with it.
- Create a volume from a backup snapshot and attach it to the new instance.
- Log into the instance, shut down apache and oracle, then delete all the oracle data files, apache config files, and a few other bits and pieces.
- Create symbolic links for the deleted bits and pieces (including the oracle data files) that point to the attached volume.
- Start up oracle and apache and test.
- Switch the elastic IP over to the new instance.
This procedure has been tested and retested multiple times, and came in useful once when I almost lost the site (actually, it just went unresponsive due to some general problems at Amazon, but at the time I thought it’d had gone down).
Last week I logged in to the AWS Management Console to do a routine backup-and-restore test, but discovered a problem: it couldn’t find the Oracle 10g XE AMI. Searches on the image ID and various keywords, across all the Amazon regions, returned no results. Searches on “oracle” brought back a number of options but none of them close to what I required. I enquired with Amazon and they responded that the AMIs are supplied by Oracle and had been removed. I discovered this meant that not only could I not start up an instance using one of these images, I also couldn’t point to my running instance and “start up another like this” – because this requires access to the original AMI that was used to start it.
The machine image which I was trying to find is (at least, as of today 5 Jul 2011) still referred to here: http://aws.amazon.com/amis/Oracle/1720 “Oracle Database 10g Release 2 Express Edition – 32 Bit” so I raised a question on the oracle forum (http://forums.oracle.com/forums/thread.jspa?messageID=9707298�) and sent an email to Bill Hodak at Oracle who was named in the description of the AMI. He replied he would see if he could find out what had happened to it.
At this point I was hoping that my running instance wouldn’t go down, because I didn’t know if I’d be able to restore from backup. My backup consisted solely of snapshots of just the data – the rest of the OS was supposed to be provided by the AMI.
Meanwhile, asam replied to my oracle forum thread, suggesting I create my own AMI. A bit of googling yielded this result, which proved very helpful: http://webkist.wordpress.com/2010/03/16/creating-an-amazon-ec2-ebs-ami-from-a-running-instance/ “Creating an Amazon EC2 EBS AMI from a running instance”. I followed the instructions, slightly modified as follows:
- Use AWS Management Console to create a new volume
- Attach the volume to my running instance and mount it:
# mkdir /u03 # mount -t ext3 /dev/sdf /u03
- Move everything from the old volume so that it all sits under / again instead of via symbolic links
- Synchronize the filesystem to the new volume:
# rsync -a --delete --progress -x / /u03
- When rsync has completed, fix up the devices:
# MAKEDEV -d /u03/dev -x console # MAKEDEV -d /u03/dev -x zero # MAKEDEV -d /u03/dev -x null
- Unmount the volume:
# umount /u03
- Get the EC2 X.509 cert and private key from the “Security Credentials” area under “Account” in AWS Management Console.
- Download the Amazon EC2 API tools:
- I needed java to run the API tools, so download the rpm:
jre-6u26-linux-i586.rpm – e.g. from http://www.oracle.com/technetwork/java/javase/downloads/jre-6u26-download-400751.html
- Upload the EC2 X.509 cert and private key, the Amazon EC2 API tools, and the java rpm to the instance. Unzip and install the API tools and the java rpm.
- Set up all the required environment variables (replace xxxwith the appropriate bits from the relevant file names):
# export EC2_CERT=/root/cert-xxx.pem # export EC2_PRIVATE_KEY=/root/pk-xxx.pem # export EC2_HOME=(path-to-ec2-stuff) # export JAVA_HOME=(path-to-java-stuff) # export PATH=$PATH:$EC2_HOME/bin
- Set up a symbolic link so that the EC2 tools can find java:
# ln -s (path-to-java-stuff) /usr/bin/java
- Back in the AWS Management Console, create a snapshot of the volume.
- In the instance, run this command (this is the only command you can’t do in the management console, which is what all that rigmarole about installing the API tools was all about):
# ec2-register --snapshot snap-xxx --description "my ami description" --name "my ami name" --ramdisk ari-yyy --kernel aki-zzz --region ap-southeast-1
You can get the snapshot, ramdisk and kernel identifiers from the AWS Management Console. (my instance was running in Singapore, so my region is ap-southeast-1)
- Back in AWS Management Console, I see my new AMI has been created. All I have to do now is select it, click Launch Instance, and a copy of my site is up and running.
I startup the database and see if it’s working. Unfortunately it isn’t – a bit more investigation revealed that the Oracle listener was not responding to requests. lsnrctl status reveals that it is still using the old internal IP address from the original instance – but this is a new instance with a different internal IP address.
To fix this, I edit listener.ora to correct the IP:
# cd /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/network/admin # chmod +w listener.ora # vi listener.ora
The IP address is listed as the “Private IP Address” on the instance in AWS Management Console.
# lsnrctl start
After that, it’s all working – and very soon I will have a much simpler (and hopefully somewhat less reliant on the kindness of big corporations) disaster recovery process. I just need to work out the simplest way to restore the data from backup to the new instance. I’ll probably just create a new volume from a backup snapshot, attach it to the instance, and copy all the data across.
UPDATE: With an EBS-backed volume, I can now create a new AMI from the running instance whenever I want – it takes a complete snapshot of the instance, from which I can then create new instances. So my disaster recovery procedure is much simpler than it was before :)