Disaster Recovery 101: Architecting Auto-Failovers for MySQL and C Deployments on DigitalOcean

Establishing a Highly Available MySQL Cluster with Automated Failover

Achieving true disaster recovery for critical data stores like MySQL necessitates more than just backups; it demands automated failover. This section details architecting a robust, multi-node MySQL setup on DigitalOcean, leveraging Percona XtraDB Cluster (PXC) for synchronous replication and automatic node promotion. PXC’s Galera replication ensures data consistency across all nodes, and its built-in health checks and quorum mechanisms facilitate seamless failover without manual intervention.

Prerequisites and Initial Setup

Before deploying PXC, ensure you have at least three DigitalOcean Droplets. These will serve as your MySQL nodes. For optimal performance and reliability, consider using Droplets with dedicated CPU and sufficient RAM. Each Droplet should have a static IP address assigned. We’ll assume a basic Ubuntu 22.04 LTS setup.

On each Droplet, update your package lists and install necessary tools:

sudo apt update
sudo apt upgrade -y
sudo apt install -y software-properties-common wget

Installing Percona XtraDB Cluster

Add the Percona repository and install PXC on all nodes. The installation process will prompt for a root password for MySQL. Ensure this password is strong and consistent across all nodes.

First, add the Percona repository key:

wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb
sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb
sudo apt update

Then, install the PXC server and client packages:

sudo apt install -y percona-xtradb-cluster percona-xtradb-cluster-client percona-xtradb-cluster-server

Configuring the Cluster

The core of PXC configuration lies in its `my.cnf` file, typically located at `/etc/mysql/my.cnf` or `/etc/mysql/percona-xtradb-cluster.conf.d/mysqld.cnf`. We need to define cluster-specific parameters. For simplicity, we’ll configure one node as the initial seed node and then join the others.

Node 1 (Seed Node) Configuration:

Edit the configuration file (e.g., `/etc/mysql/percona-xtradb-cluster.conf.d/mysqld.cnf`) and ensure the following settings are present or modified. Replace `NODE1_IP`, `NODE2_IP`, `NODE3_IP` with the actual private IP addresses of your Droplets.

[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid

# PXC specific settings
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="my_pxc_cluster"
wsrep_cluster_address="gcomm://NODE1_IP,NODE2_IP,NODE3_IP"
wsrep_node_name="pxc-node1"
wsrep_node_address="NODE1_IP"
wsrep_sst_method=rsync
wsrep_sst_auth="sstuser:YOUR_SST_PASSWORD"

binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=1G # Adjust based on Droplet RAM

bind-address=0.0.0.0

Create a dedicated user for State Snapshot Transfer (SST) and grant necessary privileges. This user is crucial for bootstrapping new nodes.

sudo mysql -u root -p -e "CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'YOUR_SST_PASSWORD';"
sudo mysql -u root -p -e "GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';"
sudo mysql -u root -p -e "FLUSH PRIVILEGES;"

Restart MySQL on Node 1 to apply these changes and initialize the cluster.

sudo systemctl restart mysql

Node 2 and Node 3 Configuration:

On Node 2 and Node 3, the configuration is similar, but `wsrep_node_name` and `wsrep_node_address` must be unique for each node. The `wsrep_cluster_address` should list all nodes in the cluster.

[mysqld]
server-id=2 # or 3 for Node 3
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid

# PXC specific settings
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="my_pxc_cluster"
wsrep_cluster_address="gcomm://NODE1_IP,NODE2_IP,NODE3_IP"
wsrep_node_name="pxc-node2" # or "pxc-node3"
wsrep_node_address="NODE2_IP" # or "NODE3_IP"
wsrep_sst_method=rsync
wsrep_sst_auth="sstuser:YOUR_SST_PASSWORD"

binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=1G # Adjust based on Droplet RAM

bind-address=0.0.0.0

On Node 2 and Node 3, you’ll need to create the `sstuser` as well, or ensure it’s created on Node 1 and accessible. Then, restart MySQL. PXC will automatically attempt to join the cluster and perform an SST if necessary.

sudo systemctl restart mysql

Verifying Cluster Status and Failover

After restarting all nodes, check the cluster status. Connect to any MySQL node and run:

SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
SHOW GLOBAL STATUS LIKE 'wsrep_local_state_comment';

You should see `wsrep_cluster_size` equal to the number of active nodes (e.g., 3) and `wsrep_local_state_comment` as ‘Synced’ on all nodes. If a node fails, PXC automatically detects the loss of quorum and promotes a remaining node to be the primary. To simulate a failure, stop the MySQL service on one node:

sudo systemctl stop mysql

Check the status on another node. The `wsrep_cluster_size` will decrease. When you restart the stopped node, it will rejoin the cluster and synchronize its state.

Architecting Auto-Failover for C/C++ Applications with Keepalived

For stateless applications or services that manage their own data persistence (e.g., through external databases or object storage), achieving high availability often involves a virtual IP (VIP) managed by a failover daemon. Keepalived is a robust, lightweight solution for this. It uses the Virtual Router Redundancy Protocol (VRRP) to manage a floating IP address that your application instances will bind to.

Setting up Keepalived

Deploy Keepalived on at least two application servers (Droplets). These servers will run your C/C++ application. Ensure they have static IP addresses. We’ll configure one as MASTER and the other as BACKUP.

Install Keepalived on both servers:

sudo apt update
sudo apt install -y keepalived

Configuring Keepalived for Failover

The primary configuration file for Keepalived is `/etc/keepalived/keepalived.conf`. We’ll define VRRP instances for our floating IP.

MASTER Server Configuration (`/etc/keepalived/keepalived.conf`):

vrrp_script chk_app {
    script "/usr/local/bin/check_app_status.sh"
    interval 2
    weight 20
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0 # Replace with your primary network interface
    virtual_router_id 51
    priority 150 # Higher priority for MASTER
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass mysecretpassword
    }
    virtual_ipaddress {
        192.168.1.100/24 dev eth0 # Your floating IP address
    }
    track_script {
        chk_app
    }
}

BACKUP Server Configuration (`/etc/keepalived/keepalived.conf`):

vrrp_script chk_app {
    script "/usr/local/bin/check_app_status.sh"
    interval 2
    weight 20
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0 # Replace with your primary network interface
    virtual_router_id 51
    priority 100 # Lower priority for BACKUP
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass mysecretpassword
    }
    virtual_ipaddress {
        192.168.1.100/24 dev eth0 # Your floating IP address
    }
    track_script {
        chk_app
    }
}

Implementing the Health Check Script

The `check_app_status.sh` script is crucial for determining the health of your application. Keepalived will reduce the priority of a node if this script fails, potentially triggering a failover. Create this script on both servers at `/usr/local/bin/check_app_status.sh`.

#!/bin/bash

APP_PORT=8080 # The port your C/C++ application listens on

# Check if the application process is running
if ! pgrep -f "your_app_executable" > /dev/null; then
    exit 1
fi

# Check if the application is listening on the expected port
if ! ss -tulnp | grep ":$APP_PORT" > /dev/null; then
    exit 1
fi

exit 0

Make the script executable:

sudo chmod +x /usr/local/bin/check_app_status.sh

Starting and Verifying Keepalived

Restart Keepalived on both servers:

sudo systemctl restart keepalived

Check the status of Keepalived and the VRRP instance:

sudo systemctl status keepalived
sudo ip addr show eth0 | grep "inet 192.168.1.100"

On the MASTER server, you should see the floating IP address assigned. On the BACKUP server, the IP should not be present initially. To test failover, stop Keepalived on the MASTER server:

sudo systemctl stop keepalived

Within a few seconds, the floating IP should appear on the BACKUP server. If your C/C++ application is configured to bind to this floating IP, it will now be accessible via the BACKUP server. Ensure your application is designed to start automatically or is managed by a process supervisor like `systemd` or `supervisord`.

Integrating with Application Deployment

Your CI/CD pipeline should deploy your C/C++ application to both servers. The application itself should be configured to listen on the floating IP address. If the application needs to access the MySQL cluster, it should connect to the floating IP of the PXC cluster (which can also be managed by Keepalived or a load balancer like HAProxy for read/write splitting and failover). This layered approach ensures both your application instances and their data store are resilient to single points of failure.