We started running our site(s) on Amazon’s cloud for flexibility. Unfortunately, we started when the service was very young, and that meant it was difficult to back up our instances. We kluged a backup script in Python from someone else’s script posted on their blog. I just had a search and I can’t even find the source any more — this was at least two years ago.
So now, it’s my job to bring us up to the present. We had a problem where our server was crashing during the backup process — first, the backup would eat up any free space on our root device, which would then slow the site to a crawl as it got stuck at 99% cpu usage. Then, I would reboot the instance only to find MySQL was unable to start because the device was full.
Luckily Amazon offers a Relational Database Service which basically provides more robust/updated database servers in the cloud. I switched us to this service for our main two databases (a blog and a forum) and so far it’s running smoothly.
Now, I’m attempting to get rid of the last bit of vestigial, non-cloud-friendly technology that we’re using — filesystem backups and automatic AMI bundling. This requires creating a new EBS volume, formatting the volume, copying the entire filesystem to this new volume, snapshotting it in the AWS console, and then creating an image from the snapshot (also in the AWS console).
I got a lot of help from this blog post: http://thewebfellas.com/blog/2010/4/19/create-an-ebs-boot-volume-from-a-running-instance
as well as this gist on GitHub: https://gist.github.com/1004950
So that’s what I’ve been up to for the past little bit here. That, and all the debugging/troubleshooting/error-deciphering/guru meditation that goes along with it.
The good news is that once I’m done, our servers will boot faster, be easier to back up, and hopefully, more resilient.