Disaster Recovery 101: Architecting Auto-Failovers for MySQL and WooCommerce Deployments on OVH

Establishing a Robust MySQL Replication Topology for High Availability

For any mission-critical application, particularly e-commerce platforms like WooCommerce, database availability is paramount. A single point of failure in the database layer can lead to significant revenue loss and reputational damage. This section details the architecture for a highly available MySQL setup leveraging asynchronous replication, a foundational element for automated failover.

We will configure a primary-replica topology. The primary instance handles all write operations, while one or more replica instances asynchronously replicate the primary’s binary logs. This setup allows for read scaling and, crucially, provides a standby instance ready to take over in case of primary failure.

Primary MySQL Server Configuration

On the primary MySQL server, ensure binary logging is enabled and configured appropriately. This is essential for replication. We’ll use `binlog_format=ROW` for maximum compatibility and data integrity.

[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
expire_logs_days = 7
relay_log = /var/log/mysql/mysql-relay-bin.log
read_only = OFF
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

After modifying the configuration, restart the MySQL service:

sudo systemctl restart mysql

Next, create a dedicated replication user on the primary server. This user will be used by the replica to connect and fetch binary logs.

CREATE USER 'replicator'@'%' IDENTIFIED BY 'your_strong_password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;

Replica MySQL Server Configuration

On the replica server, configure it with a unique `server-id`. It should also have binary logging enabled if it’s intended to be promoted to a primary later or to serve as a source for further replicas. For a simple primary-replica setup, `read_only = ON` is crucial to prevent accidental writes.

[mysqld]
server-id = 2
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
expire_logs_days = 7
relay_log = /var/log/mysql/mysql-relay-bin.log
read_only = ON
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

Restart the MySQL service on the replica:

sudo systemctl restart mysql

Establishing the Replication Channel

Before initiating replication, it’s vital to ensure the replica has a consistent snapshot of the primary’s data. The most robust method is to take a backup of the primary while locking tables briefly, or by using `mysqldump` with specific options to capture the binary log position.

On the primary, execute the following to get the current binary log file and position. This should be done just before starting the data transfer to the replica.

SHOW MASTER STATUS;

Note down the `File` and `Position` from the output. Then, on the replica, configure it to connect to the primary. Replace `primary_ip_address` and `your_strong_password` with your actual values.

CHANGE MASTER TO
  MASTER_HOST='primary_ip_address',
  MASTER_USER='replicator',
  MASTER_PASSWORD='your_strong_password',
  MASTER_LOG_FILE='mysql-bin.XXXXXX', -- From SHOW MASTER STATUS
  MASTER_LOG_POS=YYYYYY; -- From SHOW MASTER STATUS

START SLAVE;

Verify the replication status on the replica:

SHOW SLAVE STATUS\G

Ensure `Slave_IO_Running: Yes` and `Slave_SQL_Running: Yes`. Also, check `Seconds_Behind_Master` to be consistently low (ideally 0 or very close to it).

Orchestrating Automated Failover with Orchestrator

While MySQL replication provides the foundation for high availability, manual intervention is still required for failover. To achieve automated failover, we introduce Orchestrator, a powerful, open-source tool designed for MySQL topology management and automated failover.

Orchestrator Installation and Configuration

Orchestrator can be installed on a separate, highly available management node. The installation process typically involves downloading a pre-compiled binary or building from source. For this example, we’ll assume a Linux environment.

Download the latest release:

wget https://github.com/openark/orchestrator/releases/download/v3.2.7/orchestrator-3.2.7-linux-amd64.tar.gz
tar -zxvf orchestrator-3.2.7-linux-amd64.tar.gz
sudo mv orchestrator-3.2.7-linux-amd64 /usr/local/orchestrator

Orchestrator requires a backend database to store its topology information. A dedicated MySQL instance (or even a local SQLite database for testing) can be used. For production, a separate, highly available MySQL instance is recommended.

Create a configuration file, e.g., `/etc/orchestrator/orchestrator.conf.json`:

{
  "Debug": false,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "orchestrator",
  "MySQLTopologyPassword": "your_orchestrator_db_password",
  "MySQLOrchestratorHost": "127.0.0.1",
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "orchestrator",
  "DiscoveryPeriodSeconds": 10,
  "FailureDetectionPeriodSeconds": 10,
  "PromotionLagQuery": "SELECT * FROM mysql.slave_master_info WHERE Master_Host IS NOT NULL AND Slave_IO_Running = 'Yes' AND Slave_SQL_Running = 'Yes' ORDER BY Seconds_Behind_Master DESC LIMIT 1",
  "SlaveLagQuery": "SELECT * FROM mysql.slave_master_info WHERE Master_Host IS NOT NULL AND Slave_IO_Running = 'Yes' AND Slave_SQL_Running = 'Yes' ORDER BY Seconds_Behind_Master DESC LIMIT 1",
  "DetectClusterAliasConflicts": true,
  "SnapshotTopologiesOnStartup": true,
  "PostponePromotionOnClusterDiscovery": true,
  "RecoveryPeriodBlockSeconds": 300,
  "RecoveryPeriodBlockUnsuccessfulSeconds": 600,
  "PreemptivePaging": false,
  "Hooks": [
    {
      "name": "mysql-replication-lag",
      "script": "/etc/orchestrator/hooks/mysql-replication-lag.sh",
      "timeout": 60,
      "discovery": true,
      "replication-analysis": true
    },
    {
      "name": "mysql-unreachable",
      "script": "/etc/orchestrator/hooks/mysql-unreachable.sh",
      "timeout": 60,
      "unreachable": true
    }
  ],
  "GlobalUserACL": [
    "*"
  ],
  "GlobalWriteACL": [
    "*"
  ]
}

Create the Orchestrator database and user on your backend MySQL instance:

CREATE DATABASE orchestrator;
CREATE USER 'orchestrator'@'%' IDENTIFIED BY 'your_orchestrator_db_password';
GRANT ALL PRIVILEGES ON orchestrator.* TO 'orchestrator'@'%';
FLUSH PRIVILEGES;

Initialize the Orchestrator database schema:

/usr/local/orchestrator/orchestrator-client -c discover -i 127.0.0.1:3306 -u orchestrator -p your_orchestrator_db_password

Now, configure Orchestrator to discover your MySQL topology. You can do this via the API or by adding entries to its configuration. For a simple setup, you can point it to one of your MySQL instances, and it will discover the rest.

# Add this to orchestrator.conf.json
"MySQLOrchestratorHost": "your_primary_mysql_ip",
"MySQLOrchestratorPort": 3306

Start the Orchestrator service. It’s recommended to run it under a systemd service for proper management.

# Create a systemd service file, e.g., /etc/systemd/system/orchestrator.service
[Unit]
Description=Orchestrator MySQL HA Manager
After=network.target

[Service]
User=orchestrator
Group=orchestrator
ExecStart=/usr/local/orchestrator/orchestrator -c /etc/orchestrator/orchestrator.conf.json
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable orchestrator
sudo systemctl start orchestrator

Configuring Orchestrator for Automated Failover

Orchestrator’s power lies in its ability to detect failures and automatically promote a replica. This is configured through its hooks and recovery settings. The `PromotionLagQuery` and `SlaveLagQuery` are critical for Orchestrator to determine the best replica to promote.

The `hooks` section in `orchestrator.conf.json` allows you to define scripts that run when specific events occur. For automated failover, we’ll focus on `mysql-unreachable` and potentially `mysql-replication-lag`.

Create a hook script for handling unreachable MySQL instances, e.g., `/etc/orchestrator/hooks/mysql-unreachable.sh`:

#!/bin/bash

# This script is executed by Orchestrator when a MySQL instance is detected as unreachable.
# It should contain logic to trigger failover or alert.

INSTANCE_KEY="$1" # e.g., "hostname:port"
INSTANCE_HOST=$(echo "$INSTANCE_KEY" | cut -d: -f1)
INSTANCE_PORT=$(echo "$INSTANCE_KEY" | cut -d: -f2)

echo "MySQL instance $INSTANCE_KEY is unreachable. Initiating automated failover..."

# Trigger Orchestrator's recovery mechanism.
# Orchestrator will automatically attempt to promote a replica if configured.
# You can also add custom logic here, e.g., sending alerts.

# Example: Triggering failover via Orchestrator API (requires orchestrator-client)
# This is a simplified example; robust error handling and idempotency are crucial.
# orchestrator-client -c discover -i $INSTANCE_HOST:$INSTANCE_PORT
# orchestrator-client -c promote-to-primary -i $INSTANCE_KEY --skip-discover

# For automated failover, Orchestrator's internal logic is usually sufficient
# if configured correctly. This hook primarily serves for alerting or custom actions.

# Example: Sending an alert (e.g., via Slack or PagerDuty)
# curl -X POST -H 'Content-type: application/json' --data '{"text":"MySQL instance '$INSTANCE_KEY' is unreachable. Automated failover initiated."}' YOUR_SLACK_WEBHOOK_URL

exit 0

Make the script executable:

sudo chmod +x /etc/orchestrator/hooks/mysql-unreachable.sh

Ensure your `orchestrator.conf.json` has the correct `RecoveryPeriodBlockSeconds` and `RecoveryPeriodBlockUnsuccessfulSeconds` set to reasonable values to prevent rapid, repeated failover attempts.

Integrating WooCommerce with the HA MySQL Cluster

WooCommerce, by default, connects to a single MySQL database. To leverage the high availability setup orchestrated by Orchestrator, we need to ensure that WooCommerce can connect to the *current* primary instance. This typically involves a load balancer or a proxy layer.

Using a Proxy Layer (ProxySQL)

ProxySQL is an excellent choice for this. It sits between your application and your MySQL servers, intelligently routing traffic. ProxySQL can be configured to monitor MySQL servers and automatically switch traffic to a new primary if the current one becomes unavailable.

Install ProxySQL on a separate node or co-located with your Orchestrator management node.

# Example for Ubuntu/Debian
sudo apt update
sudo apt install proxysql

Configure ProxySQL’s `mysql_servers` table to include your primary and replica instances. ProxySQL will use this information to monitor the servers and their roles.

-- Connect to ProxySQL's admin interface
mysql -u admin -padmin -h 127.0.0.1 -P6032

-- Add your primary server
INSERT INTO mysql_servers (hostname, hostgroup_id, port, weight) VALUES ('primary_mysql_ip', 10, 3306, 1000);

-- Add your replica server(s)
INSERT INTO mysql_servers (hostname, hostgroup_id, port, weight) VALUES ('replica_mysql_ip', 20, 3306, 1000);

-- Define hostgroups for read/write splitting and failover
-- Hostgroup 10: Primary (writes and reads)
-- Hostgroup 20: Replica (reads only, candidate for promotion)
-- Hostgroup 30: All servers (for failover detection)
LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL USERS TO RUNTIME;
LOAD MYSQL QUERY RULES TO RUNTIME;
LOAD MYSQL CACHED STATEMENTS TO RUNTIME;
LOAD MYSQL MIRRORED ACCOUNTS TO RUNTIME;
LOAD MYSQL BINLOG LOGINS TO RUNTIME;

-- Configure hostgroups for failover
-- Hostgroup 10: Primary
-- Hostgroup 20: Replica
-- Hostgroup 30: All servers (for monitoring)
-- This requires careful configuration of ProxySQL's internal monitoring and failover logic.
-- A common approach is to use ProxySQL's ability to detect server failures and
-- then use its query rules to redirect traffic.

-- Example: Configure ProxySQL to monitor servers and perform failover
-- This is a simplified example. For robust failover, consider integrating
-- ProxySQL with Orchestrator or using its native failover capabilities.

-- Ensure ProxySQL's scheduler is running to perform checks
-- SET mysql-monitor_interval=1000; -- milliseconds
-- SET mysql-monitor_connect_interval=1000;
-- SET mysql-monitor_read_only_interval=1000;

-- Configure query rules to direct writes to the primary hostgroup
-- and reads to either primary or replica.
-- This is a complex topic and depends on your specific needs.
-- For automated failover, ProxySQL needs to be aware of the *current* primary.
-- This can be achieved by:
-- 1. ProxySQL monitoring Orchestrator's actions.
-- 2. Orchestrator updating ProxySQL's configuration upon promotion.

-- A common pattern is to have Orchestrator trigger a script that updates ProxySQL.
-- Example hook in Orchestrator:
-- {
-- "name": "update-proxysql",
-- "script": "/etc/orchestrator/hooks/update-proxysql.sh",
-- "promotion": true,
-- "demotion": true
-- }

-- The update-proxysql.sh script would then:
-- 1. Get the new primary from Orchestrator's API.
-- 2. Update ProxySQL's mysql_servers table (e.g., change hostgroup_id for the new primary).
-- 3. Reload ProxySQL configuration.

-- Example of a simplified update-proxysql.sh script:
-- #!/bin/bash
-- NEW_PRIMARY_HOST="$1"
-- NEW_PRIMARY_PORT="$2"
--
-- mysql -uadmin -padmin -h127.0.0.1 -P6032 <

Update your WooCommerce `wp-config.php` to point to the ProxySQL instance's IP address and port (default is 3306).

define( 'DB_HOST', 'proxysql_ip_address:3306' );
define( 'DB_USER', 'your_woocommerce_db_user' );
define( 'DB_PASSWORD', 'your_woocommerce_db_password' );
define( 'DB_NAME', 'your_woocommerce_db_name' );

OVH Specific Considerations

When deploying on OVH, consider the following:

Network Latency: Ensure your MySQL primary, replicas, Orchestrator, and ProxySQL are deployed within the same OVH region and ideally the same availability zone to minimize latency. High latency can lead to replication lag and impact failover performance.
Instance Sizing: Choose appropriate instance types for your MySQL servers, Orchestrator, and ProxySQL. Database instances require sufficient CPU, RAM, and IOPS.
Security Groups/Firewalls: Configure OVH security groups or firewall rules to allow necessary traffic between your MySQL instances, Orchestrator, ProxySQL, and your application servers. Restrict access to the minimum required ports (e.g., 3306 for MySQL, 3000 for Orchestrator UI, 6032 for ProxySQL admin).
Managed Databases: If using OVH's managed MySQL services, investigate their built-in HA and failover capabilities. While they might simplify setup, ensure they meet your specific requirements for automation and control. Orchestrator can still be used to manage failover for managed instances if they expose the necessary replication status.
Monitoring and Alerting: Implement comprehensive monitoring for your MySQL instances, replication status, Orchestrator, and ProxySQL. OVH's monitoring tools, combined with external solutions like Prometheus/Grafana, are essential for proactive issue detection.

Testing and Validation

Thorough testing is non-negotiable. Simulate various failure scenarios:

Primary Server Crash: Stop the MySQL process on the primary. Observe Orchestrator's logs and verify that it detects the failure, promotes a replica, and updates ProxySQL (if integrated). Check WooCommerce's connectivity.
Network Partition: Simulate network issues between the primary and replicas.
Replica Failure: Stop a replica and ensure the remaining replicas continue to function and that the primary is unaffected.
Orchestrator Node Failure: If Orchestrator is deployed in a highly available manner (e.g., active-passive with a virtual IP or using its built-in clustering), test its failover.

Monitor `Seconds_Behind_Master` on replicas during and after failover. Ensure that the promotion process is swift and that WooCommerce experiences minimal downtime. Document your failover procedures and regularly review them.