Automating Multi-Region Redundancy for PHP Architectures on Google Cloud

Establishing Multi-Region Redundancy for PHP Applications on Google Cloud

Achieving robust disaster recovery for PHP architectures on Google Cloud Platform (GCP) necessitates a multi-region strategy. This goes beyond simple load balancing within a single region; it involves replicating critical components and data across geographically distinct zones to ensure service continuity in the face of regional outages. This document outlines a practical, code-driven approach to implementing such redundancy, focusing on key services like Compute Engine, Cloud SQL, Cloud Storage, and Load Balancing.

Automated Infrastructure Provisioning with Terraform

Manual infrastructure setup is error-prone and not conducive to rapid recovery. Terraform provides an Infrastructure as Code (IaC) solution that allows us to define and provision our multi-region GCP resources declaratively. We’ll define separate VPC networks, Compute Engine instances, and Cloud SQL instances for each target region.

Consider a simplified Terraform configuration for two regions, us-central1 and europe-west1. This example focuses on the core compute and database resources.

Terraform Configuration for `main.tf`

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 4.0"
    }
  }
}

provider "google" {
  project = var.gcp_project_id
  region  = var.primary_region
}

variable "gcp_project_id" {
  description = "The GCP project ID."
  type        = string
}

variable "primary_region" {
  description = "The primary GCP region for deployment."
  type        = string
  default     = "us-central1"
}

variable "secondary_region" {
  description = "The secondary GCP region for deployment."
  type        = string
  default     = "europe-west1"
}

# --- Primary Region Resources ---

resource "google_compute_network" "primary_vpc" {
  name                    = "${var.gcp_project_id}-vpc-${var.primary_region}"
  auto_create_subnetworks = false
}

resource "google_compute_instance" "primary_app_server" {
  name         = "php-app-primary-${random_id.suffix.hex}"
  machine_type = "e2-medium"
  zone         = "${var.primary_region}-a" # Example zone

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }

  network_interface {
    network = google_compute_network.primary_vpc.id
    access_config {
      // Ephemeral IP for initial setup, will be managed by Load Balancer
    }
  }

  metadata = {
    ssh-keys = "your-username:${file("~/.ssh/id_rsa.pub")}"
  }

  tags = ["php-app", "primary"]

  lifecycle {
    create_before_destroy = true
  }
}

resource "google_sql_database_instance" "primary_db" {
  name             = "php-db-primary-${random_id.suffix.hex}"
  region           = var.primary_region
  database_version = "POSTGRES_14" # Or MYSQL_8_0
  settings {
    tier = "db-f1-micro" # Adjust for production
    ip_configuration {
      ipv4_enabled    = true
      private_network = google_compute_network.primary_vpc.id
    }
    backup_configuration {
      enabled = true
      binary_log_enabled = true # For replication
    }
  }
  deletion_protection = false # Set to true for production
}

# --- Secondary Region Resources ---

resource "google_compute_network" "secondary_vpc" {
  name                    = "${var.gcp_project_id}-vpc-${var.secondary_region}"
  auto_create_subnetworks = false
}

resource "google_compute_instance" "secondary_app_server" {
  name         = "php-app-secondary-${random_id.suffix.hex}"
  machine_type = "e2-medium"
  zone         = "${var.secondary_region}-a" # Example zone

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }

  network_interface {
    network = google_compute_network.secondary_vpc.id
    access_config {
      // Ephemeral IP for initial setup, will be managed by Load Balancer
    }
  }

  metadata = {
    ssh-keys = "your-username:${file("~/.ssh/id_rsa.pub")}"
  }

  tags = ["php-app", "secondary"]

  lifecycle {
    create_before_destroy = true
  }
}

resource "google_sql_database_instance" "secondary_db" {
  name             = "php-db-secondary-${random_id.suffix.hex}"
  region           = var.secondary_region
  database_version = "POSTGRES_14" # Or MYSQL_8_0
  settings {
    tier = "db-f1-micro" # Adjust for production
    ip_configuration {
      ipv4_enabled    = true
      private_network = google_compute_network.secondary_vpc.id
    }
    backup_configuration {
      enabled = true
      binary_log_enabled = true # For replication
    }
  }
  deletion_protection = false # Set to true for production
}

# Helper for unique resource names
resource "random_id" "suffix" {
  byte_length = 4
}

output "primary_app_server_internal_ip" {
  value = google_compute_instance.primary_app_server.network_interface[0].network_ip
}

output "secondary_app_server_internal_ip" {
  value = google_compute_instance.secondary_app_server.network_internal_ip
}

output "primary_db_private_ip" {
  value = google_sql_database_instance.primary_db.private_ip_address
}

output "secondary_db_private_ip" {
  value = google_sql_database_instance.secondary_db.private_ip_address
}

To deploy this, you would typically create a terraform.tfvars file with your project ID and then run:

terraform init
terraform plan -var="gcp_project_id=your-gcp-project-id"
terraform apply -var="gcp_project_id=your-gcp-project-id"

Database Replication Strategy

For disaster recovery, database replication is paramount. Cloud SQL offers built-in read replicas and cross-region replication capabilities. For PostgreSQL, we can leverage logical replication. For MySQL, binary log replication is the standard.

PostgreSQL Cross-Region Replication (Logical Replication)

The primary database instance in us-central1 will be configured as the source for replication. The secondary instance in europe-west1 will be set up as a subscriber.

First, ensure logical replication is enabled on the primary instance. This is typically done via instance flags. For Cloud SQL, this is managed through the GCP console or `gcloud` CLI. The relevant flags are cloudsql.logical_decoding and wal_level = logical.

# Example using gcloud to update instance flags (requires instance restart)
gcloud sql instances patch [PRIMARY_DB_INSTANCE_NAME] \
  --database-flags="cloudsql.logical_decoding=on,wal_level=logical"

Next, create a replication user on the primary instance:

-- Connect to your primary PostgreSQL instance via psql or Cloud Shell
CREATE USER replicator WITH REPLICATION LOGIN PASSWORD 'your_replication_password';
GRANT rds_replication TO replicator;
-- Grant necessary permissions on the database(s) to be replicated
GRANT ALL PRIVILEGES ON DATABASE your_app_db TO replicator;

On the secondary instance, configure it to connect to the primary. This involves setting up a subscription. The exact method can vary, but conceptually, you’ll use `pg_basebackup` or a similar tool to get an initial snapshot and then configure logical replication.

A more robust approach for Cloud SQL is to set up a dedicated replication instance or use Cloud SQL’s built-in read replica functionality and then promote it. However, for true active-passive multi-region DR, manual logical replication setup or a managed service like Datastream might be more appropriate.

MySQL Cross-Region Replication

For MySQL, Cloud SQL supports cross-region read replicas. This is the simplest and most recommended method for DR.

# Create a read replica in the secondary region
gcloud sql read-replicas create [REPLICA_INSTANCE_NAME] \
  --master-instance-name=[PRIMARY_DB_INSTANCE_NAME] \
  --region=[SECONDARY_REGION] \
  --project=[YOUR_GCP_PROJECT_ID]

Once the replica is created and synchronized, it can be promoted to a standalone instance in the event of a primary region failure. This promotion process is a manual step during a disaster, but the replication is automated.

Global Load Balancing and Health Checks

To direct traffic to the active region, we’ll use Google Cloud’s Global External HTTP(S) Load Balancer. This load balancer can span multiple regions and perform health checks on backend services in each region.

Setting up the Load Balancer

This involves creating backend services for each region, configuring health checks, and then creating a global forwarding rule.

# 1. Create Instance Groups for each region (if not using GKE/GCE managed instance groups)
# For simplicity, we'll assume individual instances for now.
# In a production setup, use Managed Instance Groups (MIGs).

# 2. Create Health Checks
gcloud compute health-checks create http php-health-check \
  --request-path="/healthz" \
  --port=80 \
  --check-interval=10s \
  --timeout=5s \
  --unhealthy-threshold=3 \
  --healthy-threshold=2

# 3. Create Backend Services for each region
gcloud compute backend-services create php-backend-us \
  --protocol=HTTP \
  --port-name=http \
  --health-checks=php-health-check \
  --global

gcloud compute backend-services create php-backend-eu \
  --protocol=HTTP \
  --port-name=http \
  --health-checks=php-health-check \
  --global

# 4. Add instances to Backend Services (using their internal IPs for now, assuming NAT/proxy for external access)
# In a real scenario, you'd add MIGs or network endpoint groups (NEGs).
# For individual instances, this is more complex and usually involves a proxy layer.
# A common pattern is to have the LB point to a proxy VM in each region, which then routes to app servers.
# For simplicity here, let's assume direct instance access is managed via firewall rules.

# If using MIGs:
# gcloud compute instance-groups managed create php-mig-us --template=... --zone=...
# gcloud compute backend-services add-backend php-backend-us --instance-group=php-mig-us --instance-group-zone=... --global

# For direct instance attachment (less common for global LB):
# You'd typically use NEGs for this.
# gcloud compute network-endpoint-groups create php-neg-us --region=us-central1 --network-endpoint-type=GCE_VM_IP_PORT --default-port=80
# gcloud compute network-endpoint-groups add-endpoints php-neg-us --instance=php-app-primary-... --instance-zone=us-central1-a --ip-address=$(gcloud compute instances describe php-app-primary-... --zone=us-central1-a --format='get(networkInterfaces[0].networkIP)')

# 5. Create URL Map
gcloud compute url-maps create php-url-map \
  --default-service php-backend-us # Initially point to primary

# 6. Create Target HTTP(S) Proxy
gcloud compute target-http-proxies create php-http-proxy \
  --url-map=php-url-map

# 7. Create Global Forwarding Rule
gcloud compute forwarding-rules create php-forwarding-rule \
  --address=YOUR_STATIC_IP_ADDRESS \
  --target-http-proxy=php-http-proxy \
  --ports=80 \
  --global

The /healthz endpoint on your PHP application should return a 200 OK status when the application is healthy. This is crucial for the load balancer to accurately route traffic.

Application Deployment and Configuration

Your PHP application code needs to be deployed to instances in both regions. This can be achieved through CI/CD pipelines (e.g., Cloud Build, Jenkins, GitLab CI) that push artifacts to Compute Engine instances or container registries.

Database Connection Management

The PHP application must be configured to connect to the appropriate database instance. Environment variables or configuration files are common methods. During a failover, these configurations need to be updated.

<?php

// config/database.php

$dbConfig = [
    'driver' => 'pgsql', // or 'mysql'
    'host' => getenv('DB_HOST'),
    'port' => getenv('DB_PORT', '5432'),
    'database' => getenv('DB_DATABASE'),
    'username' => getenv('DB_USERNAME'),
    'password' => getenv('DB_PASSWORD'),
];

// Example using PDO
try {
    $dsn = "{$dbConfig['driver']}:host={$dbConfig['host']};port={$dbConfig['port']};dbname={$dbConfig['database']}";
    $pdo = new PDO($dsn, $dbConfig['username'], $dbConfig['password']);
    $pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
} catch (PDOException $e) {
    // Log error and potentially trigger a fallback mechanism
    error_log("Database connection failed: " . $e->getMessage());
    // In a DR scenario, you might attempt to connect to a secondary endpoint here
    // or signal an outage.
    die("Database connection error.");
}

// Use $pdo for your database operations...
?>

The DB_HOST environment variable would point to the private IP of the primary database instance during normal operation and to the secondary (promoted) instance during a failover.

Failover and Failback Procedures

Automating failover is complex and often involves custom scripting or managed services. A common approach is a semi-automated process:

Detection: Monitoring systems (e.g., Cloud Monitoring, Prometheus) detect widespread failures in the primary region (e.g., multiple health check failures for the global load balancer).
Alerting: Alerts are triggered to the on-call DevOps team.
Manual/Semi-Automated Failover:
- Database: If using MySQL read replicas, promote the secondary replica to a standalone instance. For PostgreSQL logical replication, this might involve stopping replication, ensuring data consistency, and making the secondary instance writable.
- Load Balancer: Update the URL map of the global load balancer to point the default service to the backend in the secondary region. This can be scripted using gcloud.
- Application Deployment: Ensure the application in the secondary region is running and configured correctly. If using blue/green deployments, switch traffic to the green environment in the secondary region.
Verification: Thoroughly test application functionality and data integrity in the secondary region.

Scripting Load Balancer Failover

#!/bin/bash

PRIMARY_REGION="us-central1"
SECONDARY_REGION="europe-west1"
GCP_PROJECT_ID=$(gcloud config get-value project)

# Assume backend services are named 'php-backend-us' and 'php-backend-eu'
# Assume URL map is 'php-url-map'

echo "Initiating failover to secondary region: ${SECONDARY_REGION}"

# Update URL map to point to the secondary backend service
gcloud compute url-maps change-backend php-url-map \
  --default-service="php-backend-${SECONDARY_REGION}" \
  --global \
  --project=${GCP_PROJECT_ID}

echo "Load balancer updated. Traffic should now be directed to ${SECONDARY_REGION}."

# Further steps would include:
# 1. Promoting the secondary database instance (if not already automated).
# 2. Verifying application health in the secondary region.
# 3. Potentially scaling up resources in the secondary region.

Failback Procedure

Failback is the process of returning operations to the primary region once it’s restored. This typically involves:

Restoring the primary database from the secondary (which is now the primary). This might involve reversing replication or performing a data dump/restore.
Updating the load balancer URL map back to the primary region’s backend.
Ensuring the primary region’s infrastructure is fully functional and synchronized.

Careful planning and testing of both failover and failback procedures are critical for a successful multi-region DR strategy.