Self-Healing PostgreSQL on OCI: A Terraform Module That Lets You Sleep Through Failovers

Table of content

Your database will fail at 2 AM. The question is: will you care?
Architecture overview
The failover dance
Storage architecture
What's configurable
Why not managed PaaS?
Quick start
Why we open-sourced this
Go break things (safely)

Your database will fail at 2 AM. The question is: will you care?

It's 2 AM. Your primary PostgreSQL instance just died. Your phone is buzzing. You're half-asleep, SSHing into production in your underwear, trying to remember which node is the replica and whether you can safely promote it without losing data.

Or — hear me out — you stay asleep.

Sleeping peacefully while the database handles itself

That's the pitch. We built a Terraform module that deploys a production-ready, self-healing PostgreSQL cluster on Oracle Cloud Infrastructure. Three nodes, automatic failover, and a load balancer smart enough to know who the leader is — all provisioned with a single terraform apply.

It's open source: github.com/obytes/oci-postgres-cluster

Architecture overview

The cluster runs three nodes: two PostgreSQL instances managed by Patroni, and one lightweight etcd witness for quorum. A Network Load Balancer sits in front and routes traffic exclusively to the current primary.

Loading diagram...

Here's what each piece does:

Patroni is the brain. It manages PostgreSQL lifecycle — startup, replication, health monitoring, and leader election. When the primary disappears, Patroni on the replica detects it and triggers promotion.
etcd is the vote. A distributed key-value store that holds the cluster state and provides consensus. It prevents split-brain scenarios where both nodes think they're the leader (that's the nightmare).
The witness node is the tiebreaker. It runs etcd only — no PostgreSQL, no block volumes, minimal compute. Its sole purpose is to maintain quorum: with 3 etcd members, any 2 can form a majority (2/3). Without it, losing one node means losing quorum entirely.
The NLB is the router. It doesn't just check if port 5432 is open. It sends HTTP GET /primary to Patroni's REST API on port 8008. Only the actual leader returns 200 OK. Replicas return 503. Dead nodes return nothing. This is the secret sauce — your application connects to one stable IP, and the NLB always knows where to send traffic.

The failover dance

So what actually happens when the primary dies? Let's walk through it.

This is fine — your primary just crashed

Loading diagram...

Step by step:

Node 1 crashes. Patroni's TTL expires after ~10 seconds. The etcd lease is gone.
Quorum check. Node 2's etcd + the witness still form a majority (2 out of 3 members). Consensus holds.
Promotion. Patroni on Node 2 sees the leader key is vacant, acquires the lock, and promotes PostgreSQL from replica to primary. This takes ~5-15 seconds depending on WAL replay.
Health check flips. The NLB's next health check hits Node 2's /primary endpoint — 200 OK. Node 1 is unreachable — connection refused. The NLB updates its routing table.
Traffic reroutes. Your application's next query goes to the NLB, which sends it to Node 2. Done.

Total downtime: 15-30 seconds. No SSH. No runbooks. No 3 AM heroics.

The critical detail most PostgreSQL HA setups get wrong: using a dumb TCP health check on port 5432. That tells you PostgreSQL is running, not that it's the leader. A newly promoted replica still has port 5432 open during the transition. A deposed primary might still accept connections briefly. The Patroni HTTP API eliminates this ambiguity entirely — it's the single source of truth for leadership.

Storage architecture

Each PostgreSQL node gets three dedicated block volumes, plus the boot volume. Separating these isn't just organizational — it's about I/O isolation.

Loading diagram...

Data volume (/pgdata) — Your tables, indexes, and TOAST data. Sized generously at 1 TB by default.
WAL volume (/pgwal) — Write-Ahead Logs on a separate volume means sequential WAL writes don't compete with random data I/O. This matters under heavy write loads.
Backup volume (/pgbackup) — Dedicated space for pg_basebackup and WAL archival. Keeps backups from eating into data or WAL headroom.

All three support OCI KMS encryption — pass a kms_key_id and every volume is encrypted with your customer-managed key. Skip it and OCI's platform-managed encryption is used by default. All volume sizes are configurable.

The setup uses LVM under the hood (volume groups and logical volumes), and the user-data scripts detect volumes by size — making the whole process idempotent across instance restarts.

What's configurable

This isn't a rigid template you fork and find-replace. Everything is parameterized:

PostgreSQL version — defaults to 15, but pass any major version
etcd version — defaults to v3.5.17
Compute specs — separate sizing for PostgreSQL nodes and the witness (the witness defaults to 1 OCPU / 8 GB because it only runs etcd — no need for 32 GB of RAM on a tiebreaker)
Volume sizes — data, WAL, backup, and boot volumes are all independently configurable
PostgreSQL tuning — max_connections, shared_buffers, effective_cache_size via a simple object variable
KMS encryption — optional, just pass the key OCID
Network — bring your own VCN, subnet, and reserved private IPs
NSG rules — fully customizable ingress rules for each service port

And importantly: no provider lock-in in the module. It doesn't configure its own OCI provider — it inherits from the caller. That's not an accident; that's following Terraform best practices.

Why not managed PaaS?

My precious — full control over the database

Managed databases are great — until they're not. You want to pin a specific PostgreSQL minor version? Tune wal_level and max_wal_size? Use a custom extension? Control exactly where your data lives, how it's encrypted, and when backups run?

Self-hosted PostgreSQL HA gives you the knobs. Patroni + etcd is a battle-tested pattern used by GitLab, Zalando, and plenty of companies running PostgreSQL at scale. This module packages that pattern into something you can terraform apply in minutes instead of spending a week writing cloud-init scripts.

Quick start

module "postgres-cluster" {
  source       = "github.com/obytes/oci-postgres-cluster//modules/postgres-cluster"
  prefix       = "myapp"
  cluster_name = "MYAPP-POSTGRES"

  compartment_id = var.compartment_id
  vcn_id         = var.vcn_id
  subnet_id      = var.subnet_id
  subnet_cidr    = "10.0.2.0/24"

  family_shape            = "VM.Standard.E4.Flex"
  image_id                = data.oci_core_images.oracle_linux.images[0].id
  postgres_instance_specs = { ocpus = 4, memory = 32 }

  reserved_private_ips = [
    cidrhost("10.0.2.0/24", 175),  # node1
    cidrhost("10.0.2.0/24", 176),  # node2
    cidrhost("10.0.2.0/24", 177),  # witness
  ]

  ssh_authorized_keys_postgres = var.ssh_public_key
}

That's the minimum. For the full example with KMS encryption, NSG rules, and all the trimmings, check out examples/complete/.

Why we open-sourced this

We built this module, battle-tested it in production, and decided the community should have it. There's no catch. No "enterprise tier" behind a paywall. No "contact sales for the HA version."

PostgreSQL high availability shouldn't be a mystery. The Patroni + etcd pattern is well-documented, but packaging it into a clean, reusable Terraform module for OCI — with proper health checks, storage separation, KMS encryption, and a witness node — takes real engineering time. We already spent that time. Now you don't have to.