Disaster Recovery 101: Architecting Auto-Failovers for MySQL and C Deployments on DigitalOcean
Establishing a Highly Available MySQL Cluster with Automated Failover
Achieving true disaster recovery for critical data stores like MySQL necessitates more than just backups; it demands automated failover. This section details architecting a robust, multi-node MySQL setup on DigitalOcean, leveraging Percona XtraDB Cluster (PXC) for synchronous replication and automatic node promotion. PXC’s Galera replication ensures data consistency across all nodes, and its built-in health checks and quorum mechanisms facilitate seamless failover without manual intervention.
Prerequisites and Initial Setup
Before deploying PXC, ensure you have at least three DigitalOcean Droplets. These will serve as your MySQL nodes. For optimal performance and reliability, consider using Droplets with dedicated CPU and sufficient RAM. Each Droplet should have a static IP address assigned. We’ll assume a basic Ubuntu 22.04 LTS setup.
On each Droplet, update your package lists and install necessary tools:
sudo apt update sudo apt upgrade -y sudo apt install -y software-properties-common wget
Installing Percona XtraDB Cluster
Add the Percona repository and install PXC on all nodes. The installation process will prompt for a root password for MySQL. Ensure this password is strong and consistent across all nodes.
First, add the Percona repository key:
wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb sudo apt update
Then, install the PXC server and client packages:
sudo apt install -y percona-xtradb-cluster percona-xtradb-cluster-client percona-xtradb-cluster-server
Configuring the Cluster
The core of PXC configuration lies in its `my.cnf` file, typically located at `/etc/mysql/my.cnf` or `/etc/mysql/percona-xtradb-cluster.conf.d/mysqld.cnf`. We need to define cluster-specific parameters. For simplicity, we’ll configure one node as the initial seed node and then join the others.
Node 1 (Seed Node) Configuration:
Edit the configuration file (e.g., `/etc/mysql/percona-xtradb-cluster.conf.d/mysqld.cnf`) and ensure the following settings are present or modified. Replace `NODE1_IP`, `NODE2_IP`, `NODE3_IP` with the actual private IP addresses of your Droplets.
[mysqld] server-id=1 datadir=/var/lib/mysql socket=/var/run/mysqld/mysqld.sock log-error=/var/log/mysql/error.log pid-file=/var/run/mysqld/mysqld.pid # PXC specific settings wsrep_provider=/usr/lib/galera/libgalera_smm.so wsrep_cluster_name="my_pxc_cluster" wsrep_cluster_address="gcomm://NODE1_IP,NODE2_IP,NODE3_IP" wsrep_node_name="pxc-node1" wsrep_node_address="NODE1_IP" wsrep_sst_method=rsync wsrep_sst_auth="sstuser:YOUR_SST_PASSWORD" binlog_format=ROW default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 innodb_flush_log_at_trx_commit=0 innodb_buffer_pool_size=1G # Adjust based on Droplet RAM bind-address=0.0.0.0
Create a dedicated user for State Snapshot Transfer (SST) and grant necessary privileges. This user is crucial for bootstrapping new nodes.
sudo mysql -u root -p -e "CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'YOUR_SST_PASSWORD';" sudo mysql -u root -p -e "GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';" sudo mysql -u root -p -e "FLUSH PRIVILEGES;"
Restart MySQL on Node 1 to apply these changes and initialize the cluster.
sudo systemctl restart mysql
Node 2 and Node 3 Configuration:
On Node 2 and Node 3, the configuration is similar, but `wsrep_node_name` and `wsrep_node_address` must be unique for each node. The `wsrep_cluster_address` should list all nodes in the cluster.
[mysqld] server-id=2 # or 3 for Node 3 datadir=/var/lib/mysql socket=/var/run/mysqld/mysqld.sock log-error=/var/log/mysql/error.log pid-file=/var/run/mysqld/mysqld.pid # PXC specific settings wsrep_provider=/usr/lib/galera/libgalera_smm.so wsrep_cluster_name="my_pxc_cluster" wsrep_cluster_address="gcomm://NODE1_IP,NODE2_IP,NODE3_IP" wsrep_node_name="pxc-node2" # or "pxc-node3" wsrep_node_address="NODE2_IP" # or "NODE3_IP" wsrep_sst_method=rsync wsrep_sst_auth="sstuser:YOUR_SST_PASSWORD" binlog_format=ROW default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 innodb_flush_log_at_trx_commit=0 innodb_buffer_pool_size=1G # Adjust based on Droplet RAM bind-address=0.0.0.0
On Node 2 and Node 3, you’ll need to create the `sstuser` as well, or ensure it’s created on Node 1 and accessible. Then, restart MySQL. PXC will automatically attempt to join the cluster and perform an SST if necessary.
sudo systemctl restart mysql
Verifying Cluster Status and Failover
After restarting all nodes, check the cluster status. Connect to any MySQL node and run:
SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; SHOW GLOBAL STATUS LIKE 'wsrep_local_state_comment';
You should see `wsrep_cluster_size` equal to the number of active nodes (e.g., 3) and `wsrep_local_state_comment` as ‘Synced’ on all nodes. If a node fails, PXC automatically detects the loss of quorum and promotes a remaining node to be the primary. To simulate a failure, stop the MySQL service on one node:
sudo systemctl stop mysql
Check the status on another node. The `wsrep_cluster_size` will decrease. When you restart the stopped node, it will rejoin the cluster and synchronize its state.
Architecting Auto-Failover for C/C++ Applications with Keepalived
For stateless applications or services that manage their own data persistence (e.g., through external databases or object storage), achieving high availability often involves a virtual IP (VIP) managed by a failover daemon. Keepalived is a robust, lightweight solution for this. It uses the Virtual Router Redundancy Protocol (VRRP) to manage a floating IP address that your application instances will bind to.
Setting up Keepalived
Deploy Keepalived on at least two application servers (Droplets). These servers will run your C/C++ application. Ensure they have static IP addresses. We’ll configure one as MASTER and the other as BACKUP.
Install Keepalived on both servers:
sudo apt update sudo apt install -y keepalived
Configuring Keepalived for Failover
The primary configuration file for Keepalived is `/etc/keepalived/keepalived.conf`. We’ll define VRRP instances for our floating IP.
MASTER Server Configuration (`/etc/keepalived/keepalived.conf`):
vrrp_script chk_app {
script "/usr/local/bin/check_app_status.sh"
interval 2
weight 20
fall 2
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0 # Replace with your primary network interface
virtual_router_id 51
priority 150 # Higher priority for MASTER
advert_int 1
authentication {
auth_type PASS
auth_pass mysecretpassword
}
virtual_ipaddress {
192.168.1.100/24 dev eth0 # Your floating IP address
}
track_script {
chk_app
}
}
BACKUP Server Configuration (`/etc/keepalived/keepalived.conf`):
vrrp_script chk_app {
script "/usr/local/bin/check_app_status.sh"
interval 2
weight 20
fall 2
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0 # Replace with your primary network interface
virtual_router_id 51
priority 100 # Lower priority for BACKUP
advert_int 1
authentication {
auth_type PASS
auth_pass mysecretpassword
}
virtual_ipaddress {
192.168.1.100/24 dev eth0 # Your floating IP address
}
track_script {
chk_app
}
}
Implementing the Health Check Script
The `check_app_status.sh` script is crucial for determining the health of your application. Keepalived will reduce the priority of a node if this script fails, potentially triggering a failover. Create this script on both servers at `/usr/local/bin/check_app_status.sh`.
#!/bin/bash
APP_PORT=8080 # The port your C/C++ application listens on
# Check if the application process is running
if ! pgrep -f "your_app_executable" > /dev/null; then
exit 1
fi
# Check if the application is listening on the expected port
if ! ss -tulnp | grep ":$APP_PORT" > /dev/null; then
exit 1
fi
exit 0
Make the script executable:
sudo chmod +x /usr/local/bin/check_app_status.sh
Starting and Verifying Keepalived
Restart Keepalived on both servers:
sudo systemctl restart keepalived
Check the status of Keepalived and the VRRP instance:
sudo systemctl status keepalived sudo ip addr show eth0 | grep "inet 192.168.1.100"
On the MASTER server, you should see the floating IP address assigned. On the BACKUP server, the IP should not be present initially. To test failover, stop Keepalived on the MASTER server:
sudo systemctl stop keepalived
Within a few seconds, the floating IP should appear on the BACKUP server. If your C/C++ application is configured to bind to this floating IP, it will now be accessible via the BACKUP server. Ensure your application is designed to start automatically or is managed by a process supervisor like `systemd` or `supervisord`.
Integrating with Application Deployment
Your CI/CD pipeline should deploy your C/C++ application to both servers. The application itself should be configured to listen on the floating IP address. If the application needs to access the MySQL cluster, it should connect to the floating IP of the PXC cluster (which can also be managed by Keepalived or a load balancer like HAProxy for read/write splitting and failover). This layered approach ensures both your application instances and their data store are resilient to single points of failure.