Disaster Recovery 101: Architecting Auto-Failovers for MySQL and WordPress Deployments on Linode

Establishing a High-Availability MySQL Cluster with Replication and Orchestrator

Achieving automated failover for a WordPress deployment hinges on a robust, highly available MySQL backend. We’ll architect this using MySQL replication, specifically a primary-replica setup, and then introduce Orchestrator for automated detection and promotion of replicas during failures.

Our baseline will be a primary MySQL server and at least one replica. For true high availability, we’ll aim for a minimum of two replicas. This setup ensures that even if the primary fails, a replica can be promoted to take its place with minimal data loss.

MySQL Replication Configuration

On the primary MySQL server, ensure binary logging is enabled. This is crucial for replication. Edit your my.cnf (or my.ini on Windows) file:

[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON

Restart the MySQL service after making these changes. On the replica(s), configure them similarly, ensuring a unique server-id for each:

[mysqld]
server-id = 2  # Or 3, 4, etc. for other replicas
relay_log = /var/log/mysql/mysql-relay-bin.log
read_only = ON # Important for replicas to prevent accidental writes
gtid_mode = ON
enforce_gtid_consistency = ON

Restart the MySQL service on the replica(s).

Setting up Replication Users and Initial Data Sync

On the primary, create a dedicated user for replication. Replace 'replica_user' and 'your_strong_password' with secure credentials.

CREATE USER 'replica_user'@'%' IDENTIFIED BY 'your_strong_password';
GRANT REPLICATION SLAVE ON *.* TO 'replica_user'@'%';
FLUSH PRIVILEGES;

To initialize the replica with the primary’s data, you have several options. For minimal downtime, using mysqldump with --single-transaction and --master-data=2 is common. However, for large databases, this can be slow and lock tables. A more advanced approach involves using Percona XtraBackup or MariaDB Backup for a hot backup, which is non-blocking.

Assuming a smaller dataset or a maintenance window, here’s the mysqldump approach:

# On the primary:
mysqldump --all-databases --master-data=2 --single-transaction -u root -p > initial_dump.sql

# On the replica (after clearing existing data if necessary):
mysql -u root -p < initial_dump.sql

After the data is restored on the replica, you need to configure it to connect to the primary. Obtain the binary log file name and position from the initial_dump.sql file (look for the `CHANGE MASTER TO` statement). Alternatively, and preferably with GTID enabled, you can use GTID information.

On the replica, execute:

-- If using binlog position (less preferred with GTID):
-- CHANGE MASTER TO MASTER_HOST='primary_ip_address', MASTER_USER='replica_user', MASTER_PASSWORD='your_strong_password', MASTER_LOG_FILE='mysql-bin.XXXXXX', MASTER_LOG_POS=YYYY;

-- If using GTID (preferred):
CHANGE MASTER TO MASTER_HOST='primary_ip_address', MASTER_USER='replica_user', MASTER_PASSWORD='your_strong_password', MASTER_AUTO_POSITION=1;

Start the replication process:

START SLAVE;

Verify replication status:

SHOW SLAVE STATUS\G;

Ensure Slave_IO_Running: Yes and Slave_SQL_Running: Yes, and that Seconds_Behind_Master is low and stable.

Automating Failover with Orchestrator

Orchestrator is a powerful open-source tool for MySQL high availability and disaster recovery. It discovers your topology, detects failures, and automates failover. We’ll deploy Orchestrator on a separate host (or multiple hosts for its own HA) to manage the MySQL cluster.

Orchestrator Installation and Configuration

Install Orchestrator on a dedicated server. The installation process varies by OS; refer to the official Orchestrator documentation for specific instructions. For example, on Ubuntu/Debian:

wget https://github.com/openark/orchestrator/releases/download/v3.2.7/orchestrator_3.2.7_linux_amd64.deb
sudo dpkg -i orchestrator_3.2.7_linux_amd64.deb

Orchestrator requires a backend database to store its state. This can be MySQL itself. Configure Orchestrator by editing its configuration file, typically located at /etc/orchestrator/orchestrator.conf.json.

{
  "Debug": true,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "orchestrator",
  "MySQLTopologyPassword": "orchestrator_password",
  "MySQLOrchestratorHost": "127.0.0.1",
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "orchestrator",
  "MySQLOrchestratorUser": "orchestrator",
  "MySQLOrchestratorPassword": "orchestrator_password",
  "DiscoveryPeriodSeconds": 10,
  "PromotionHooks": [
    {
      "Name": "wordpress_db_update_hook",
      "Executions": [
        {
          "OnFailure": true,
          "OnSuccess": false,
          "Command": "/usr/local/bin/update_wordpress_db_config.sh",
          "Arguments": ["{instance.HostName}", "{instance.Port}"]
        }
      ]
    }
  ],
  "FailureDetectionPeriodBlockSeconds": 60,
  "FailureDetectionCount": 3,
  "RecoveryPeriodBlockSeconds": 300,
  "DetectClusterTopologyOrder": "discovery",
  "DetectClusterTopologyPeriodSeconds": 30,
  "SlaveLagQuery": "SELECT * FROM mysql.slave_master_info WHERE Master_Host IS NOT NULL AND Slave_SQL_Running = 'Yes' AND Slave_IO_Running = 'Yes' AND Seconds_Behind_Master > {lag_seconds}",
  "SlaveLagQueryForAllReplicas": true,
  "GlobalWriteableCells": ["cell1", "cell2"],
  "InstanceWriteableCells": {
    "primary_host:3306": ["cell1"]
  }
}

Key configuration points:

MySQLTopologyUser/Password: Credentials for Orchestrator to connect to the MySQL instances it manages. Create this user in your MySQL cluster.
MySQLOrchestratorDatabase: The database Orchestrator uses for its own state.
DiscoveryPeriodSeconds: How often Orchestrator scans the topology.
PromotionHooks: Crucial for automating application-level changes after a failover. We’ll detail the update_wordpress_db_config.sh script next.
FailureDetectionPeriodBlockSeconds, FailureDetectionCount: Parameters for detecting failures.
SlaveLagQuery: Defines how Orchestrator checks for replication lag.

Start and enable the Orchestrator service:

sudo systemctl start orchestrator
sudo systemctl enable orchestrator

Orchestrator Discovery and Initial Setup

Once Orchestrator is running, you need to tell it about your MySQL cluster. You can do this via its API or by adding instances directly in the web UI (usually accessible at http://orchestrator_host:3000).

To add an instance via the API (e.g., using curl):

curl -i -X POST "http://localhost:3000/api/discover-instance?instance=primary_ip_address:3306"
curl -i -X POST "http://localhost:3000/api/discover-instance?instance=replica1_ip_address:3306"
curl -i -X POST "http://localhost:3000/api/discover-instance?instance=replica2_ip_address:3306"

Orchestrator will then scan these instances, discover the replication topology, and present it in its web UI. It will identify the primary and its replicas.

Automating WordPress Database Configuration Updates

The most critical part of automated failover for WordPress is updating the application’s database connection details when the primary MySQL server changes. This is where the PromotionHooks in Orchestrator come into play.

The `update_wordpress_db_config.sh` Script

This script will be executed by Orchestrator whenever a failover occurs (specifically, when a replica is promoted). It needs to update the wp-config.php file on your WordPress web servers.

#!/bin/bash

# This script is executed by Orchestrator on promotion hooks.
# It updates WordPress wp-config.php with the new database host.

NEW_DB_HOST="$1"
NEW_DB_PORT="$2" # Orchestrator passes port, though WordPress typically uses default 3306

# Ensure the script is run with root privileges
if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root"
   exit 1
fi

# Path to your WordPress installation(s)
# This might need to be a loop or a more sophisticated mechanism
# if you have multiple WordPress sites on different servers.
WP_CONFIG_PATH="/var/www/html/wp-config.php"

if [ -z "$NEW_DB_HOST" ]; then
  echo "Error: New database host not provided."
  exit 1
fi

echo "Updating WordPress config for new DB host: $NEW_DB_HOST"

# Backup the original wp-config.php
cp "$WP_CONFIG_PATH" "$WP_CONFIG_PATH.bak_$(date +%Y%m%d_%H%M%S)"

# Use sed to replace the DB_HOST definition.
# This assumes a standard define('DB_HOST', '...'); format.
# Be cautious with this regex; it might need adjustment based on your wp-config.php.
sed -i "s/define('DB_HOST', '[^']*');/define('DB_HOST', '$NEW_DB_HOST');/" "$WP_CONFIG_PATH"

# Verify the change (optional but recommended)
if grep -q "define('DB_HOST', '$NEW_DB_HOST');" "$WP_CONFIG_PATH"; then
  echo "Successfully updated DB_HOST in $WP_CONFIG_PATH."
  exit 0
else
  echo "Error: Failed to update DB_HOST in $WP_CONFIG_PATH."
  # Optionally restore from backup if update failed
  # cp "$WP_CONFIG_PATH.bak_latest" "$WP_CONFIG_PATH"
  exit 1
fi

Make this script executable:

sudo chmod +x /usr/local/bin/update_wordpress_db_config.sh

Important Considerations for the Script:

Multiple WordPress Instances: If you have multiple WordPress sites, this script needs to be aware of all their wp-config.php locations. You might need to pass additional arguments to the script via Orchestrator’s hooks or have a central configuration file that lists all WordPress paths.
SSH Access: The Orchestrator server needs SSH access to the WordPress web servers to update their wp-config.php files. Ensure SSH keys are set up correctly and the user running Orchestrator has the necessary permissions.
Configuration Management: For more complex environments, consider using configuration management tools like Ansible, Chef, or Puppet to manage the deployment and updates of wp-config.php across your web fleet, triggered by Orchestrator.
Database Credentials: This script only updates the DB host. WordPress database username and password are typically stored in wp-config.php as well. If these also need to change (e.g., if the promoted replica uses different credentials, which is uncommon but possible), the script must be extended.

Testing the Automated Failover

Thorough testing is paramount. Orchestrator provides a “Test Failover” button in its UI. This simulates a failure of the current primary and initiates the promotion of a replica. Observe the following:

Orchestrator Logs: Monitor Orchestrator’s logs for successful detection and promotion.
MySQL Cluster State: Verify that the promoted replica is now writable and that other replicas are following it.
WordPress Accessibility: Check if your WordPress site is accessible and functional. This is the ultimate test of your PromotionHooks script.
Replication Status: Ensure new replicas are being set up correctly to follow the new primary.

To manually trigger a failover for testing, you can stop the MySQL process on the primary server. Orchestrator should detect this, mark the instance as dead, and initiate the promotion process for one of its replicas. The choice of which replica gets promoted is based on Orchestrator’s internal logic, often favoring replicas that are most up-to-date and have the fewest replicas following them.

Advanced Considerations and Next Steps

This setup provides a solid foundation for automated MySQL failover. For production environments, consider these enhancements:

Orchestrator High Availability: Run multiple instances of Orchestrator for its own resilience.
Read Replicas: If your WordPress site has heavy read traffic, you can configure additional read-only replicas and point your application’s read queries to them. Orchestrator can help manage these as well.
Proxy Layer: Introduce a proxy like ProxySQL or MaxScale between WordPress and MySQL. These proxies can intelligently route read/write traffic, handle connection pooling, and even perform failover detection and routing themselves, potentially simplifying the application’s role.
Monitoring and Alerting: Integrate Orchestrator’s status and MySQL cluster health with your existing monitoring systems (e.g., Prometheus, Grafana, Nagios) to receive alerts on failures or potential issues.
Automated Backups: Ensure a robust, automated backup strategy is in place, independent of replication. Tools like Percona XtraBackup or cloud provider snapshots are essential for point-in-time recovery.
Network Latency: For geographically distributed deployments, network latency can impact replication lag. Consider data center proximity and network optimization.
Security: Harden all MySQL instances, Orchestrator, and web servers. Use strong passwords, firewall rules, and consider SSL for replication traffic.

By combining MySQL replication, Orchestrator for automated failover, and a well-crafted script to update application configurations, you can significantly improve the resilience and uptime of your WordPress deployments on Linode.