• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • Home
  • Projects
  • Products
  • Themes
  • Tools
  • Request for Quote

Vengala Vinay

Having 12+ Years of Experience in Software Development

  • Home
  • WordPress
  • PHP
    • Codeigniter
  • Django
  • Magento
  • Selenium
  • Server
Home » Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and Perl Deployments on AWS

Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and Perl Deployments on AWS

Automated PostgreSQL Failover with Patroni and AWS RDS

Achieving true high availability for PostgreSQL, especially in a cloud-native environment like AWS, necessitates an automated failover strategy. Manual intervention during an outage is a non-starter for production systems. We’ll architect a solution leveraging Patroni, a template for highly available PostgreSQL, in conjunction with AWS services for robust failover orchestration.

While AWS RDS offers managed failover, direct control and customization are often required for complex architectures or specific compliance needs. Patroni provides this granular control by managing PostgreSQL clusters, ensuring data consistency, and orchestrating leader election during failures. For this setup, we’ll assume a multi-AZ deployment of AWS EC2 instances, each running a PostgreSQL instance managed by Patroni. A distributed consensus store (like etcd or ZooKeeper) is crucial for Patroni to operate; for simplicity and integration with AWS, we’ll use AWS Systems Manager Parameter Store as our distributed configuration store, though a dedicated etcd cluster is often preferred for higher performance and resilience in very large deployments.

Patroni Configuration for AWS Systems Manager Parameter Store

The core of Patroni’s configuration lies in its `patroni.yml` file. We’ll adapt this to point to AWS Systems Manager Parameter Store for storing cluster state and configuration. This requires the `aws` Python library to be installed on the Patroni nodes and appropriate IAM permissions.

First, ensure Patroni is installed on your EC2 instances. A common method is using pip:

sudo pip install 'patroni[aws]'

Next, create the `patroni.yml` configuration file. The key here is the `dcs` section, configured for `aws`.

# patroni.yml
scope: my_pg_cluster
name: &node_name &{HOSTNAME} # Dynamically set hostname

# PostgreSQL configuration
postgresql:
  listen: 0.0.0.0:5432
  data_dir: /var/lib/postgresql/14/main
  config_dir: /etc/postgresql/14/main
  bin_dir: /usr/lib/postgresql/14/bin
  pg_hba:
    - host    all             all             0.0.0.0/0               md5
  parameters:
    max_connections: 100
    shared_buffers: 128MB
    wal_level: replica
    hot_standby: "on"
    max_wal_senders: 5
    max_replication_slots: 5

# Distributed Configuration Store (DCS) - AWS Systems Manager Parameter Store
dcs:
  ttl: 30
  loop_wait: 10
  retry_timeout: 10
  maximum_lag_on_failover: 1048576 # 1MB
  postgresql:
    use_pg_rewind: true
    use_slots: true
  aws:
    region: us-east-1
    ssm_parameter_path: /patroni/my_pg_cluster # Path in SSM Parameter Store
    # Optional: specify profile or credentials if not using instance roles
    # profile: default
    # aws_access_key_id: YOUR_ACCESS_KEY
    # aws_secret_access_key: YOUR_SECRET_KEY

# Replication configuration
replication:
  synchronous_mode: false # Set to true for synchronous replication if needed
  synchronous_node_names: []

# Tags for AWS resources (optional)
tags:
  Project: MyProject
  Environment: Production

# REST API configuration
restapi:
  listen: 0.0.0.0:8008
  connect_address: &{IP_ADDRESS}:8008 # Dynamically set IP address

# Logging configuration
log:
  level: INFO
  dir: /var/log/patroni
  file: patroni.log

The `&{HOSTNAME}` and `&{IP_ADDRESS}` are Jinja2 templating variables that Patroni will resolve. Ensure your EC2 instances have an IAM role attached with permissions to read and write to AWS Systems Manager Parameter Store. Specifically, `ssm:GetParameter`, `ssm:PutParameter`, `ssm:DeleteParameter`, and `ssm:ListParameters` are typically required for the specified path.

Setting up the PostgreSQL Cluster

Once Patroni is configured, you can start it on your nodes. The first node to start will initialize the cluster. Patroni will create the necessary parameters in SSM Parameter Store under the specified `ssm_parameter_path`.

# On each EC2 instance
sudo systemctl enable patroni
sudo systemctl start patroni

You can verify the cluster state by checking the SSM Parameter Store console or using the AWS CLI:

aws ssm get-parameters-by-path --path "/patroni/my_pg_cluster" --region us-east-1

Patroni will automatically handle PostgreSQL initialization, replication setup (WAL streaming), and leader election. If a primary node fails, Patroni will detect this via the DCS (SSM Parameter Store), elect a new primary from the available replicas, and reconfigure the remaining replicas to follow the new primary.

Integrating with Perl Applications

For Perl applications, the primary concern is connecting to the *current* primary PostgreSQL instance. Hardcoding IP addresses is brittle. A common pattern is to use a load balancer or a DNS-based approach. In AWS, an Elastic Load Balancer (ELB) or Network Load Balancer (NLB) is ideal.

Using AWS Network Load Balancer (NLB)

An NLB is a Layer 4 load balancer that can forward TCP traffic. It’s suitable for database connections because it doesn’t terminate the connection and is highly performant. We’ll configure the NLB to listen on port 5432 and forward traffic to the PostgreSQL instances on port 5432.

  • Create a Network Load Balancer in your VPC.
  • Configure a listener on port 5432 (TCP).
  • Create a Target Group. The targets will be your EC2 instances running PostgreSQL. Ensure the health check is configured to check the PostgreSQL port (5432). Patroni exposes a health check endpoint on its REST API (e.g., `http://:8008/primary`, `http://:8008/replica`). You can configure the NLB’s target group health check to query Patroni’s API for a more robust check of the PostgreSQL *instance’s* readiness as a primary or replica. For example, a health check targeting http://<ip>:8008/primary that expects a 200 OK response when the instance is a primary, and a 404 or 500 when it’s not. This requires a custom health check script or a more advanced NLB configuration if available. A simpler approach is to just check port 5432, and rely on Patroni’s leader election to ensure only the primary is accepting writes.
  • Register your EC2 instances as targets in the Target Group.
  • Associate the listener with the Target Group.

Your Perl applications will then connect to the DNS name of the NLB. When a failover occurs, Patroni will ensure the new primary is available, and the NLB will automatically route traffic to it (assuming the health checks are correctly configured to reflect the primary’s status, or simply by virtue of the primary being the only one accepting writes and responding on 5432).

Perl DBI Connection String Example

Your Perl application’s database connection string would look something like this:

use DBI;

my $db_host = 'your-nlb-dns-name.amazonaws.com'; # The DNS name of your NLB
my $db_port = 5432;
my $db_name = 'mydatabase';
my $db_user = 'app_user';
my $db_pass = 'secure_password';

my $dsn = "DBI:Pg:database=$db_name;host=$db_host;port=$db_port";

my $dbh = DBI->connect($dsn, $db_user, $db_pass, { RaiseError => 1, AutoCommit => 1 });

# ... application logic ...

$dbh->disconnect();

The beauty of this setup is that the application code remains unaware of the failover. The NLB’s DNS name remains constant, and traffic is seamlessly redirected by AWS infrastructure.

Monitoring and Alerting

Robust monitoring is paramount. You need to be alerted not just when a failover *happens*, but also if the cluster enters a degraded state or if failover attempts are failing.

  • Patroni API: Regularly poll the Patroni REST API endpoints (`/primary`, `/replica`, `/cluster`) on each node. Use tools like Prometheus with a custom exporter or a simple script to check the cluster status.
  • PostgreSQL Metrics: Monitor PostgreSQL-specific metrics like replication lag, connection counts, query performance, and disk I/O.
  • AWS CloudWatch: Utilize CloudWatch for EC2 instance health (CPU, memory, disk), RDS metrics (if using RDS), and NLB metrics (healthy hosts, request counts).
  • SSM Parameter Store: Monitor the `ttl` parameter in SSM. If it’s not updated regularly, it indicates a problem with the Patroni node or its ability to communicate with SSM.
  • Alerting: Integrate with alerting systems like PagerDuty, Opsgenie, or Slack. Key alerts include:
    • Cluster is in a read-only state (no primary).
    • Replication lag exceeds a critical threshold.
    • Patroni node is unresponsive.
    • Failover attempts are failing repeatedly.
    • Health check failures on the NLB Target Group.

For automated alerting based on Patroni’s state, you can write a Perl script that periodically queries the Patroni API of the *current* primary (obtained from the cluster endpoint) and triggers alerts via AWS SNS or directly to your alerting system if the cluster state is not healthy.

#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use JSON;
use Net::Amazon::S3; # Example for SNS, adjust as needed

my $cluster_api_url = 'http://<patroni_api_ip>:8008/cluster'; # Replace with a way to find the primary API endpoint
my $alert_sns_topic = 'arn:aws:sns:us-east-1:123456789012:MyDatabaseAlerts'; # Replace with your SNS topic ARN

my $ua = LWP::UserAgent->new;
$ua->timeout(10);

my $response = $ua->get($cluster_api_url);

if ($response->is_success) {
    my $cluster_data = decode_json($response->decoded_content);
    my $primary = $cluster_data->{master'}; # 'master' key for older versions, 'leader' for newer
    my $lag_bytes = 0;
    my $replication_lag_threshold = 10 * 1024 * 1024; # 10MB

    if ($primary) {
        print "Current Primary: " . $primary->{host} . "\n";

        # Check replication lag for replicas
        foreach my $replica (@{$cluster_data->{replicas}}) {
            if ($replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes} && $replica->{lag_bytes

Primary Sidebar

A little about the Author

Having 12+ Years of Experience in Software Development, Vinay is a principal software architect, senior systems engineer, and elite technical consultant. He specializes in bespoke PHP/WordPress development, high-performance Magento 2 & Shopify architectures, custom plugin/theme development from scratch, and legacy code modernization (including VB6, VB.NET, PyQt, and Crystal Reports). Known for solving complex database bottlenecks, speed optimization (Core Web Vitals), and advanced security code auditing, Vinay engineers production-ready systems designed to scale under heavy concurrent load conditions.



Chat on WhatsApp

Recent Posts

  • Legacy Perl CGI vs. Modern PSGI/Plack Web Engines vs. PHP-FPM: Benchmark of HTTP Context Lifetimes
  • Laravel Service Container vs. Ruby on Rails Convention over Configuration: Dependency Injection vs. Magic Autoloading
  • Plugin Hook System vs. Event Middleware: Comparing WordPress Actions/Filters and Laravel Event Listeners
  • Routing Latency: Benchmarking Laravel Compiled Router vs. Rails Action Dispatch vs. Perl Dancer2 Routing
  • Web Session Persistence: PHP Sessions (Laravel/WordPress) vs. Ruby on Rails CookieStore Security Models

Categories

  • apache (1)
  • Business & Monetization (390)
  • Centos (4)
  • Comparisons & Decision Making (55)
  • Debian (2)
  • Debugging & Troubleshooting (583)
  • DevOps (7)
  • DevOps & Cloud Scaling (956)
  • Django (1)
  • Laravel (4)
  • Migration & Architecture (192)
  • MySQL (1)
  • Performance & Optimization (783)
  • PHP (5)
  • PHP Development (13)
  • Plugins & Themes (244)
  • Programming Languages (1)
  • Python (3)
  • Ruby on Rails (1)
  • Security & Compliance (543)
  • SEO & Growth (491)
  • Server (23)
  • Ubuntu (9)
  • Web Applications & Frontend (1)
  • WordPress (22)
  • WordPress Plugin Development (7)
  • WordPress Theme Development (356)

Recent Posts

  • Legacy Perl CGI vs. Modern PSGI/Plack Web Engines vs. PHP-FPM: Benchmark of HTTP Context Lifetimes
  • Laravel Service Container vs. Ruby on Rails Convention over Configuration: Dependency Injection vs. Magic Autoloading
  • Plugin Hook System vs. Event Middleware: Comparing WordPress Actions/Filters and Laravel Event Listeners
  • Routing Latency: Benchmarking Laravel Compiled Router vs. Rails Action Dispatch vs. Perl Dancer2 Routing
  • Web Session Persistence: PHP Sessions (Laravel/WordPress) vs. Ruby on Rails CookieStore Security Models
  • Templates Compilation: Blade Engines vs. ERB (Ruby) vs. Perl Template Toolkit render overhead

Top Categories

  • DevOps & Cloud Scaling (956)
  • Performance & Optimization (783)
  • Debugging & Troubleshooting (583)
  • Security & Compliance (543)
  • SEO & Growth (491)
  • Business & Monetization (390)

Our Products

  • School Management & Student Administration System
  • Integrated Hospital & Clinic Management System
  • Real Estate Directory & Agent Portal
  • Restaurant POS & Table Booking System
  • Retail Inventory POS & Billing System
  • Pharmacy Inventory & Clinic Billing System

Our Services

  • Vibe Engineering & AI Code Auditing Services
  • Prompt Engineering & "Vibe Coding" Workflow Consulting
  • AI-Augmented "Vibe Coding" & Rapid MVP Development
  • Figma to Shopify Liquid Theme Customization
  • Figma to WooCommerce Frontend Development
  • Figma to Magento 2 Theme Development

Copyright © 2026 · Vinay Vengala