Disaster Recovery 101: Architecting Auto-Failovers for Redis and WordPress Deployments on Linode

Establishing a High-Availability Redis Cluster for WordPress

For WordPress deployments, Redis is a critical component for caching, session management, and object caching. Ensuring its high availability is paramount to preventing downtime. We’ll architect an auto-failover solution using Redis Sentinel, which monitors Redis instances and promotes a replica to master if the primary fails. This setup will run on Linode instances, leveraging their robust infrastructure.

Redis Sentinel Configuration for Automatic Failover

Redis Sentinel is a distributed system that provides high availability for Redis. It performs three main tasks: monitoring, notification, and automatic failover. We’ll deploy at least three Sentinel instances for quorum-based decision-making, ensuring that a single Sentinel failure doesn’t trigger a false positive failover.

First, ensure you have Redis installed on your Linode instances. For this example, we’ll assume three nodes for Redis master/replica and three nodes for Sentinel. It’s common practice to co-locate Sentinel with Redis instances, but for true resilience, consider separate instances or even separate availability zones if your provider offers them. For simplicity here, we’ll assume dedicated instances for Sentinel.

On each Sentinel node (e.g., `sentinel-1`, `sentinel-2`, `sentinel-3`), create or edit the Sentinel configuration file, typically located at `/etc/redis/sentinel.conf`. Ensure the following directives are set:

Sentinel Configuration File (`sentinel.conf`)

# Sentinel configuration file

port 26379
daemonize yes
pidfile /var/run/redis/redis-sentinel.pid
logfile /var/log/redis/sentinel.log
dir /tmp

# Define the master to monitor. 'mymaster' is an arbitrary name.
# The IP and port are of the Redis master.
# The '3' is the quorum: the number of Sentinels that must agree
# that the master is down before initiating a failover.
# The '2000' is the failover timeout in milliseconds.
sentinel monitor mymaster 192.168.1.100 6379 3

# The number of seconds a replica must be unreachable to be considered
# down. This is a lower bound.
sentinel down-after-milliseconds mymaster 5000

# Number of seconds after which Sentinel will initiate a failover
# if a master is unreachable.
sentinel failover-timeout mymaster 10000

# Number of replicas to promote to master.
# If set to 1, Sentinel will try to promote one replica.
# If set to 0, no replica will be promoted.
sentinel parallel-syncs mymaster 1

# Optional: Specify the IP address and port for Redis replicas.
# This is useful if your replicas are on different IPs/ports than the master.
# sentinel discover-replica-addresses mymaster yes

# Optional: Configuration for the promoted master's password if applicable.
# sentinel auth-pass mymaster YOUR_REDIS_PASSWORD

# Optional: Configuration for replica authentication.
# sentinel replica-auth-pass mymaster YOUR_REDIS_PASSWORD

Replace `192.168.1.100` with the actual IP address of your Redis master instance. Ensure that the Redis master and its replicas are configured to accept connections from the Sentinel nodes. If your Redis instances require authentication, uncomment and set `sentinel auth-pass` and `sentinel replica-auth-pass` directives.

Redis Master and Replica Configuration

On the Redis master instance (`redis-master`), ensure your `redis.conf` allows replication and is accessible. On the replica instances (`redis-replica-1`, `redis-replica-2`, etc.), configure them to replicate from the master.

Redis Master Configuration (`redis.conf`)

# redis.conf on redis-master
port 6379
daemonize yes
pidfile /var/run/redis/redis-server.pid
logfile /var/log/redis/redis-server.log
dir /tmp
bind 0.0.0.0 # Or specific IPs for security

# If using authentication:
# requirepass YOUR_REDIS_PASSWORD

Redis Replica Configuration (`redis.conf`)

# redis.conf on redis-replica-1 (and others)
port 6379
daemonize yes
pidfile /var/run/redis/redis-server.pid
logfile /var/log/redis/redis-server.log
dir /tmp
bind 0.0.0.0 # Or specific IPs for security

# Point to the master
replicaof 192.168.1.100 6379

# If using authentication:
# requirepass YOUR_REDIS_PASSWORD
# masterauth YOUR_REDIS_PASSWORD

After configuring Redis and Sentinel, start the services. On each Sentinel node:

Starting Redis and Sentinel Services

On Redis Master and Replica nodes:

sudo systemctl start redis-server
sudo systemctl enable redis-server

On Sentinel nodes:

sudo systemctl start redis-sentinel
sudo systemctl enable redis-sentinel

Verify the status of your Sentinels and their view of the master. Connect to a Sentinel instance using `redis-cli -p 26379` and run `SENTINEL masters`. You should see `mymaster` listed with its current master and replicas.

WordPress Configuration for Redis HA

Your WordPress application needs to be aware of the Redis cluster and how to connect to it, especially during failover. The standard approach is to use a WordPress plugin that supports Redis Sentinel. The most common and robust option is the Redis Object Cache plugin.

When configuring the Redis Object Cache plugin, you’ll typically provide the Sentinel master name (`mymaster`) and the Sentinel host(s). The plugin will then query the Sentinels to discover the current Redis master. If a failover occurs, the plugin will automatically detect the new master through Sentinel.

Redis Object Cache Plugin Configuration

In your WordPress `wp-config.php` file, you’ll add or modify the Redis configuration constants. The plugin documentation provides specific instructions, but a typical setup for Sentinel would look like this:

/**
 * Redis Object Cache configuration.
 *
 * For Sentinel support, define REDIS_SENTINEL_HOSTS and REDIS_SENTINEL_MASTER_NAME.
 * The plugin will automatically discover the master.
 */
define( 'WP_REDIS_CLIENT', 'phpredis' ); // Or 'credis' if phpredis is not installed
define( 'WP_REDIS_SCHEME', 'tcp' );
define( 'REDIS_SENTINEL_MASTER_NAME', 'mymaster' );
define( 'REDIS_SENTINEL_HOSTS', '192.168.1.200:26379,192.168.1.201:26379,192.168.1.202:26379' ); // IPs of your Sentinel nodes

// If your Redis master requires authentication:
// define( 'WP_REDIS_PASSWORD', 'YOUR_REDIS_PASSWORD' );

Ensure that the IP addresses in `REDIS_SENTINEL_HOSTS` correspond to your Sentinel instances. The plugin will attempt to connect to these Sentinels to find the current master for `mymaster`. If the primary Redis master fails, Sentinel will promote a replica, and the plugin will eventually discover the new master via Sentinel, seamlessly continuing operations.

Simulating a Redis Failover for Testing

To validate your setup, it’s crucial to simulate a failure of the Redis master. This can be done by stopping the Redis master process.

Manual Failover Trigger

# On the Redis master node
sudo systemctl stop redis-server

Observe the Sentinel logs (`/var/log/redis/sentinel.log`) on your Sentinel nodes. You should see messages indicating that the master is down, Sentinels are communicating, and a failover is being initiated. The logs will show which replica is being promoted to master.

After the failover, connect to one of the Sentinel instances and run `SENTINEL master mymaster`. The output should now show a different IP address as the master, corresponding to the promoted replica. Also, check the Redis Object Cache plugin status in your WordPress admin dashboard; it should report a successful connection to the new master.

Architecting for WordPress Database High Availability

While Redis handles caching and object storage, the WordPress database (typically MySQL or MariaDB) is the core data store. Ensuring its high availability is equally, if not more, critical. For WordPress, we’ll architect an auto-failover solution for the database using MariaDB’s Galera Cluster or Percona XtraDB Cluster, which provides synchronous multi-master replication.

MariaDB Galera Cluster for WordPress

Galera Cluster is a synchronous multi-master clustering solution for MySQL/MariaDB. All nodes in the cluster are active masters, meaning writes can be performed on any node. This simplifies application architecture as there’s no single point of failure for writes. Writes are replicated synchronously across all nodes before a transaction is committed.

We’ll set up a cluster with at least three nodes for quorum and resilience. Each node will run MariaDB and the Galera replication plugin.

Galera Cluster Node Configuration (`server.cnf`)

On each node intended to be part of the cluster (e.g., `db-1`, `db-2`, `db-3`), modify the MariaDB configuration file (e.g., `/etc/mysql/mariadb.conf.d/60-galera.cnf` or `/etc/my.cnf`).

[mariadb]
# General settings
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
pid-file=/var/run/mysqld/mysqld.pid

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so # Path may vary by distribution

# Galera Cluster Configuration
wsrep_cluster_name="wp_galera_cluster"
wsrep_cluster_address="gcomm://192.168.2.10,192.168.2.11,192.168.2.12" # IPs of all cluster nodes

# Galera Node Configuration
wsrep_node_name="db-1" # Change for each node (db-2, db-3)
wsrep_node_address="192.168.2.10" # Change for each node

# Galera Synchronization and State Transfer
wsrep_sst_method=rsync # or mariabackup for larger databases
wsrep_sst_auth="sstuser:YOUR_SST_PASSWORD" # User for State Snapshot Transfer

# InnoDB Settings
innodb_autoinc_lock_mode=2 # Recommended for Galera
innodb_flush_log_at_trx_commit=0 # Recommended for Galera, but consider trade-offs
innodb_buffer_pool_size=1G # Adjust based on your Linode instance RAM

# Binding to specific IPs
bind-address=0.0.0.0 # Or specific IPs for security

Important Notes:

Replace `192.168.2.10`, `192.168.2.11`, `192.168.2.12` with the actual private IP addresses of your database nodes.
Ensure `wsrep_provider` points to the correct Galera library path for your OS.
Set `wsrep_node_name` and `wsrep_node_address` uniquely for each node.
Create a dedicated user for State Snapshot Transfer (SST) with a strong password.
`innodb_autoinc_lock_mode=2` is crucial for avoiding deadlocks on auto-increment columns.
`innodb_flush_log_at_trx_commit=0` improves write performance but sacrifices some durability in case of a crash (though Galera’s synchronous nature mitigates this significantly).

Initializing the Galera Cluster

The first node to start the cluster needs to be initialized specially. On the first node (`db-1` in our example):

# On the FIRST node (db-1)
sudo galera_new_cluster

After the first node is up and running, start MariaDB on the subsequent nodes. They will automatically join the cluster and perform an SST if necessary.

# On subsequent nodes (db-2, db-3)
sudo systemctl start mariadb
sudo systemctl enable mariadb

Verify cluster status by connecting to any node and running:

SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
SHOW GLOBAL STATUS LIKE 'wsrep_local_state_comment';

You should see `wsrep_cluster_size` equal to the number of nodes in your cluster, and `wsrep_local_state_comment` as ‘Synced’ on all nodes.

WordPress Database Connection for Galera Cluster

For WordPress to connect to the Galera cluster, you need a way to abstract the multiple nodes. A common approach is to use a load balancer or a virtual IP (VIP) that floats between the nodes. For simplicity and resilience, we’ll use HAProxy as a TCP load balancer. HAProxy will monitor the health of the MariaDB nodes and direct traffic to healthy ones.

HAProxy Configuration for MariaDB

Install HAProxy on a separate server or one of the existing nodes (though a dedicated load balancer is recommended for production). Configure HAProxy to listen on a specific port (e.g., 3306) and forward TCP connections to the Galera nodes.

# haproxy.cfg
global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    tcp
    option  tcplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000

listen mariadb_cluster
    bind *:3306
    mode tcp
    option mysql-check user haproxy_check # Use a dedicated MySQL user for health checks
    balance roundrobin
    server db-1 192.168.2.10:3306 check port 3306 inter 2000 rise 2 fall 3
    server db-2 192.168.2.11:3306 check port 3306 inter 2000 rise 2 fall 3
    server db-3 192.168.2.12:3306 check port 3306 inter 2000 rise 2 fall 3

Create a MySQL user `haproxy_check` on all your MariaDB nodes with appropriate privileges for health checks:

-- On each MariaDB node
CREATE USER 'haproxy_check'@'%' IDENTIFIED BY 'YOUR_HAPROXY_CHECK_PASSWORD';
GRANT USAGE ON *.* TO 'haproxy_check'@'%';
FLUSH PRIVILEGES;

Start HAProxy:

sudo systemctl start haproxy
sudo systemctl enable haproxy

WordPress `wp-config.php` for Galera

Update your WordPress `wp-config.php` to point to the HAProxy load balancer’s IP address and port.

/**
 * WordPress Database Configuration.
 */
define( 'DB_NAME', 'your_wordpress_db' );
define( 'DB_USER', 'your_db_user' );
define( 'DB_PASSWORD', 'your_db_password' );
define( 'DB_HOST', '192.168.3.50:3306' ); // IP of your HAProxy server and port
define( 'DB_CHARSET', 'utf8mb4' );
define( 'DB_COLLATE', '' );

With this setup, WordPress connects to HAProxy, which distributes connections to the Galera cluster. If a MariaDB node fails, HAProxy will detect it via the `mysql-check` and stop sending traffic to it. Galera’s multi-master nature means other nodes continue to serve requests seamlessly. If a node needs to be taken offline for maintenance, you can simply remove it from HAProxy’s configuration temporarily.

Orchestrating WordPress Application High Availability

The final piece of the puzzle is ensuring the WordPress application servers themselves are highly available. This typically involves deploying multiple WordPress instances behind a load balancer.

Load Balancing WordPress Instances

Use a robust load balancer like Nginx or HAProxy. For WordPress, it’s crucial to configure sticky sessions (session affinity) if you’re not storing sessions externally (e.g., in Redis, which we’ve already made highly available). However, with Redis handling sessions, sticky sessions are less critical but can still improve performance by keeping a user on the same server for consecutive requests.

Nginx Load Balancer Configuration

# nginx.conf (within http block)
upstream wordpress_backend {
    ip_hash; # Basic session affinity
    server 10.0.0.10:80 weight=10; # IP of WordPress server 1
    server 10.0.0.11:80 weight=10; # IP of WordPress server 2
    server 10.0.0.12:80 weight=10; # IP of WordPress server 3

    # Health checks for WordPress (requires Nginx Plus or custom scripting)
    # For open-source Nginx, health checks are more basic or require external tools.
    # A simple approach is to rely on the upstream server's ability to respond.
}

server {
    listen 80;
    server_name yourdomain.com;

    location / {
        proxy_pass http://wordpress_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Ensure your WordPress application servers are configured to use the highly available Redis cluster and the Galera database cluster (via HAProxy) for their connections.

Conclusion: A Resilient WordPress Stack

By architecting high availability for Redis using Sentinel and for the database using Galera Cluster with HAProxy, and by load balancing WordPress application servers, you create a robust, fault-tolerant WordPress deployment on Linode. This multi-layered approach ensures that individual component failures are handled gracefully, minimizing downtime and maintaining application availability.