PostgreSQL + WAL-E + Cloudfiles = Awesome

If you’re a big PostgreSQL fan like I am, you may have heard of a tool called WAL-E. Originally developed by Heroku, WAL-E is a tool for efficiently sending PostgreSQL’s WAL (Write Ahead Log) to the cloud. In addition to Heroku, WAL-E is now used by many companies with large PostgreSQL deployments, including Instagram.

Let’s unpack what that means. If you’ve ever set up replication with PostgreSQL you’re probably familiar with the WAL. Essentially there are two parts to replication and backup in PostgreSQL, the “base backup” and the WAL. Base backups are a copy of your database files that can be taken while the database is running. You might create base backups every night, for example. The WAL is where PostgreSQL writes each and every transaction, as they happen. When you run normal replication, the leader will send its log file to the followers as it writes it.

Instead of just using a simple socket to communicate, WAL-E sends these base backups and WAL files across the internet with the help of a cloud object store, like Cloudfiles (or any OpenStack Swift deployment). This gives you the advantage that, in addition to just being replication, you have a durable backup of your database for disaster recovery. Further, you have effectively infinite read scalability from the archives, you can keep adding more followers without putting more stress on the leader.

With the help of WAL-E’s primary author, Daniel Farina, we recently added support for OpenStack Swift to it. It’s not yet in a final release, but if you’re interested in checking it out, read on!

Read More