Automating Multi-Region Redundancy for PHP Architectures on OVH
Establishing Multi-Region Redundancy with OVHcloud: A Deep Dive for PHP Architectures
Achieving robust disaster recovery for critical PHP applications necessitates a multi-region strategy. This post outlines a practical, production-grade approach leveraging OVHcloud’s infrastructure, focusing on automated failover and data synchronization for a resilient architecture.
Core Components: Load Balancers, Databases, and Application Servers
Our strategy hinges on three primary pillars:
- Global Load Balancer (GLB): Directs traffic to the active region, performing health checks and facilitating failover. OVHcloud’s Load Balancer service is ideal for this.
- Database Replication: Ensures data consistency across regions. We’ll focus on PostgreSQL replication, a robust and widely supported option.
- Application Server Deployment: Identical, stateless application instances deployed in each region, managed via an orchestration tool like Ansible.
Database Replication Strategy: PostgreSQL Streaming Replication
For multi-region redundancy, synchronous or asynchronous streaming replication is paramount. We’ll configure a primary-replica setup where the primary database resides in Region A and a replica in Region B. For disaster recovery, the replica in Region B will be promoted to primary upon failure of Region A.
Configuring PostgreSQL Primary (Region A)
On the primary PostgreSQL server in Region A, modify postgresql.conf and pg_hba.conf.
postgresql.conf (Primary)
Ensure the following parameters are set:
wal_level = replica max_wal_senders = 5 wal_keep_segments = 64 archive_mode = on archive_command = 'cp %p /path/to/wal_archive/%f'
wal_level must be set to replica or higher. max_wal_senders defines the number of concurrent replication connections. wal_keep_segments prevents WAL files from being removed too soon. archive_command is crucial for point-in-time recovery and can be used as a fallback if streaming replication fails, though it adds latency.
pg_hba.conf (Primary)
Allow replication connections from the replica server in Region B. Replace <replica_ip_region_b> with the actual IP address of your PostgreSQL replica in Region B.
host replication replicator <replica_ip_region_b>/32 md5
Create a replication user:
CREATE USER replicator WITH REPLICATION LOGIN PASSWORD 'your_replication_password';
Restart PostgreSQL on the primary server.
Configuring PostgreSQL Replica (Region B)
On the replica PostgreSQL server in Region B, stop the PostgreSQL service, ensure the data directory is empty, and then initialize it from the primary.
sudo systemctl stop postgresql # Ensure data directory is empty (e.g., /var/lib/postgresql/13/main) sudo rm -rf /var/lib/postgresql/13/main/* sudo -u postgres pg_basebackup -h <primary_ip_region_a> -D /var/lib/postgresql/13/main -U replicator -P -v -W # Create a recovery.conf file (or use postgresql.conf for newer versions) sudo su - postgres -c "echo \"standby_mode = 'on'\" > /var/lib/postgresql/13/main/recovery.conf" sudo su - postgres -c "echo \"primary_conninfo = 'host=<primary_ip_region_a> port=5432 user=replicator password=your_replication_password'\" >> /var/lib/postgresql/13/main/recovery.conf" sudo su - postgres -c "echo \"restore_command = 'cp /path/to/wal_archive/%f %p'\" >> /var/lib/postgresql/13/main/recovery.conf" # For PostgreSQL 12+, recovery settings are in postgresql.conf # If using PostgreSQL 12+, edit postgresql.conf and add: # primary_conninfo = 'host=<primary_ip_region_a> port=5432 user=replicator password=your_replication_password' # restore_command = 'cp /path/to/wal_archive/%f %p' # standby_mode = 'on' sudo chown -R postgres:postgres /var/lib/postgresql/13/main sudo systemctl start postgresql
pg_basebackup performs a base backup from the primary. recovery.conf (or equivalent settings in postgresql.conf) tells the replica how to connect to the primary and where to find WAL files for recovery. Ensure the restore_command points to a location accessible by the replica server, ideally an object storage bucket or a shared network mount replicated across regions.
Monitoring Replication Status
On the replica, check the logs for successful connection and streaming. You can also query:
SELECT pg_is_in_recovery(); -- Should return 't' (true) SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(); -- Compare these LSNs to the primary's pg_current_wal_lsn()
Application Server Deployment and Orchestration
We’ll use Ansible for idempotent deployment of our PHP application across multiple OVHcloud regions. Each region will host a set of identical application servers.
Ansible Playbook Structure
Your Ansible inventory should define hosts per region:
[region_a_app_servers] appserver1.region_a.example.com ansible_host=192.0.2.1 appserver2.region_a.example.com ansible_host=192.0.2.2 [region_b_app_servers] appserver1.region_b.example.com ansible_host=192.0.2.10 appserver2.region_b.example.com ansible_host=192.0.2.11 [all_app_servers:children] region_a_app_servers region_b_app_servers
A simplified playbook for deploying the application:
---
- name: Deploy PHP Application
hosts: all_app_servers
become: yes
vars:
app_repo: "[email protected]:your_org/your_php_app.git"
app_deploy_path: "/var/www/html/your_app"
php_version: "8.1"
db_host: "{{ hostvars[groups['region_a_app_servers'][0]]['ansible_host'] }}" # Default to Region A DB
db_user: "appuser"
db_password: "your_db_password"
db_name: "appdb"
tasks:
- name: Ensure PHP and web server are installed
apt:
name:
- php{{ php_version }}
- libapache2-mod-php{{ php_version }}
- php{{ php_version }}-mysql
- php{{ php_version }}-pgsql
- apache2
state: present
update_cache: yes
- name: Configure Apache VirtualHost
template:
src: templates/vhost.conf.j2
dest: /etc/apache2/sites-available/your_app.conf
notify: restart apache
- name: Enable site and rewrite module
apache2_module:
name: "{{ item }}"
state: present
loop:
- rewrite
- php{{ php_version }}
notify: restart apache
- name: Enable virtual host
command: a2ensite your_app.conf
args:
creates: /etc/apache2/sites-enabled/your_app.conf
notify: restart apache
- name: Deploy application code from Git
git:
repo: "{{ app_repo }}"
dest: "{{ app_deploy_path }}"
version: main # Or a specific tag/branch
force: yes
- name: Install Composer dependencies
composer:
working_dir: "{{ app_deploy_path }}"
environment:
COMPOSER_HOME: "/root/.composer" # Or appropriate user
- name: Create .env file from template
template:
src: templates/env.j2
dest: "{{ app_deploy_path }}/.env"
owner: www-data
group: www-data
mode: '0644'
notify: restart apache
handlers:
- name: restart apache
service:
name: apache2
state: restarted
The .env.j2 template should dynamically set the database connection details. Crucially, for Region B servers, the db_host variable needs to be overridden to point to the PostgreSQL replica in Region B during deployment or via a separate Ansible task.
Dynamic Database Host Configuration for Region B
You can achieve this by passing extra variables or using Ansible facts. A common approach is to use a conditional in your playbook or a separate role.
# In your playbook, within the tasks for region_b_app_servers
- name: Set Region B DB host
set_fact:
db_host: "{{ hostvars[groups['region_b_app_servers'][0]]['ansible_host'] }}" # Assuming first host in group has DB IP fact
when: inventory_hostname in groups['region_b_app_servers']
# Then use {{ db_host }} in the .env.j2 template
Alternatively, you can define a separate inventory group for Region B's database and reference it.
OVHcloud Global Load Balancer (GLB) Configuration
The OVHcloud GLB is the entry point for your application. It will distribute traffic between your active region and your passive (DR) region. We'll configure it with health checks to automatically detect failures.
Setting up the GLB
Navigate to the OVHcloud Control Panel -> Network -> Load Balancer. Create a new Load Balancer instance.
Frontend Configuration
Define your frontend listener (e.g., TCP port 80 or 443). For HTTPS, you'll need to manage SSL certificates, either on the GLB itself or passed through to the backend servers.
Backend Pool Configuration
Create two backend pools:
- Pool A (Primary Region): Add the IP addresses of your application servers in Region A.
- Pool B (DR Region): Add the IP addresses of your application servers in Region B.
Health Checks
This is critical for failover. Configure health checks for each pool:
- Protocol: HTTP
- Port: 80 (or your application's port)
- URI: A dedicated health check endpoint in your PHP application (e.g.,
/healthz). This endpoint should return a 200 OK status code if the application is healthy and connected to its database. - Interval: 10 seconds
- Timeout: 5 seconds
- Unhealthy Threshold: 3 consecutive failures
- Healthy Threshold: 2 consecutive successes
The health check endpoint (/healthz) should perform a basic database query to ensure connectivity. If the database is unreachable, it should return a non-2xx status code.
<?php
// public/healthz.php
header('Content-Type: application/json');
$db_host = getenv('DB_HOST');
$db_user = getenv('DB_USER');
$db_pass = getenv('DB_PASS');
$db_name = getenv('DB_NAME');
try {
$dsn = "pgsql:host={$db_host};port=5432;dbname={$db_name}";
$options = [
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
PDO::ATTR_EMULATE_PREPARES => false,
PDO::ATTR_TIMEOUT => 2, // 2-second timeout
];
$pdo = new PDO($dsn, $db_user, $db_pass, $options);
// Simple query to check connectivity
$stmt = $pdo->query('SELECT 1');
$stmt->execute();
echo json_encode(['status' => 'ok', 'message' => 'Database connected']);
http_response_code(200);
} catch (\PDOException $e) {
http_response_code(503); // Service Unavailable
echo json_encode(['status' => 'error', 'message' => 'Database connection failed: ' . $e->getMessage()]);
}
?>
Load Balancing Algorithm and Failover Logic
Configure the GLB to use a load balancing algorithm (e.g., Round Robin). The key is the failover behavior. OVHcloud's GLB typically prioritizes the primary pool. If all servers in the primary pool become unhealthy, it will automatically switch traffic to the secondary pool.
Automating Failover and Failback
Manual failover is prone to error and delay. Automation is essential.
Database Failover (Manual Promotion)
When Region A's primary database fails, you need to promote the replica in Region B. This is typically a manual step, but can be scripted.
# On the PostgreSQL replica in Region B: sudo su - postgres -c "pg_ctl promote -D /var/lib/postgresql/13/main" # Verify promotion sudo su - postgres -c "SELECT pg_is_in_recovery();" -- Should return 'f' (false)
After promotion, you must update the application servers in Region B to point to the new primary database in Region B. This can be done via Ansible.
# Run this playbook targeting only region_b_app_servers
---
- name: Update DB config for Region B after failover
hosts: region_b_app_servers
become: yes
vars:
db_host: "{{ hostvars[groups['region_b_app_servers'][0]]['ansible_host'] }}" # IP of the newly promoted primary in Region B
db_user: "appuser"
db_password: "your_db_password"
db_name: "appdb"
tasks:
- name: Create/Update .env file for new primary DB
template:
src: templates/env.j2
dest: "/var/www/html/your_app/.env"
owner: www-data
group: www-data
mode: '0644'
notify: restart apache
handlers:
- name: restart apache
service:
name: apache2
state: restarted
GLB Failover
The OVHcloud GLB handles automatic failover between pools based on health checks. When the primary pool (Region A) becomes unhealthy, traffic will automatically be routed to the secondary pool (Region B). This is the "automatic" part of the failover.
Failback Strategy
Failback is often more complex than failover. Once the issue in Region A is resolved:
- Database: You'll need to re-establish replication from the current primary (Region B) back to a new primary in Region A. This might involve setting up a new primary in Region A and replicating from Region B, or reversing roles.
- Application Servers: Update application servers in Region A to point to the primary database in Region A.
- GLB: Once Region A's application servers are healthy and pointing to the correct database, the GLB will automatically start sending traffic back to Pool A as its health checks pass.
A robust failback procedure often involves a maintenance window and careful orchestration to avoid data loss or inconsistencies.
Monitoring and Alerting
Comprehensive monitoring is non-negotiable. Implement:
- OVHcloud Monitoring: Utilize OVHcloud's built-in monitoring for GLB health, server status, and resource utilization.
- Application-Level Monitoring: Tools like Prometheus/Grafana, Datadog, or New Relic to monitor application performance, error rates, and database connection status.
- Database Replication Lag: Monitor PostgreSQL replication lag using tools like
pg_stat_replicationon the primary and by checking LSNs on the replica. - Alerting: Configure alerts for critical events: GLB pool failure, database replication errors, high replication lag, and health check failures. Integrate with PagerDuty, Opsgenie, or Slack.
Security Considerations
Ensure secure communication between components:
- Database Connections: Use SSL/TLS for PostgreSQL replication and application connections.
- Ansible: Use SSH keys and encrypt sensitive variables (e.g., database passwords) using Ansible Vault.
- Network Security: Configure OVHcloud Security Groups and Firewall rules to restrict access to only necessary ports and IP ranges.
- GLB: If using HTTPS, ensure proper SSL certificate management.
Conclusion
Implementing multi-region redundancy requires careful planning and execution. By combining OVHcloud's GLB, PostgreSQL streaming replication, and an orchestration tool like Ansible, you can build a highly available and resilient PHP architecture capable of withstanding regional outages. Continuous monitoring and well-defined failover/failback procedures are key to maintaining operational integrity.