The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and PostgreSQL on AWS for Shopify

Nginx as a High-Performance Frontend Proxy

When deploying a Shopify-like application on AWS, Nginx serves as the critical entry point, handling SSL termination, static asset serving, request routing, and load balancing. Optimizing Nginx is paramount for low latency and high throughput. We’ll focus on key directives and configurations for production environments.

Nginx Configuration Tuning

The primary configuration file is typically /etc/nginx/nginx.conf, with site-specific configurations in /etc/nginx/sites-available/ and symlinked to /etc/nginx/sites-enabled/. For performance, we’ll adjust worker processes, connection limits, and caching.

Worker Processes and Connections

The worker_processes directive should ideally be set to the number of CPU cores available on your EC2 instance. worker_connections defines the maximum number of simultaneous connections a worker process can handle. The total maximum connections will be worker_processes * worker_connections.

Example `nginx.conf` Snippet

# Determine the number of CPU cores dynamically
daemon off;
master_process on;
worker_processes auto; # Or set to the number of CPU cores, e.g., 4

events {
    worker_connections 4096; # Adjust based on expected load and system limits
    multi_accept on;
    use epoll; # For Linux systems
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    # Gzip compression for dynamic content
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Buffering and timeouts for upstream connections
    proxy_connect_timeout 60s;
    proxy_send_timeout    60s;
    proxy_read_timeout    60s;
    proxy_buffer_size     16k;
    proxy_buffers         4 32k;
    proxy_busy_buffers_size 64k;

    # Client request body limits
    client_max_body_size 100M; # Adjust as needed for file uploads

    # Include server blocks
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Static Asset Caching

Leverage browser caching and Nginx’s file caching for static assets (CSS, JS, images) to reduce load on your application servers and improve perceived performance for users. Configure long expires headers and potentially use proxy_cache for assets served through your application.

Example `sites-enabled/your_app.conf` Snippet

server {
    listen 80;
    server_name your-domain.com;

    # Redirect HTTP to HTTPS
    location / {
        return 301 https://$host$request_uri;
    }
}

server {
    listen 443 ssl http2;
    server_name your-domain.com;

    ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 10m;
    ssl_session_tickets off;

    # Serve static assets directly
    location ~* \.(css|js|jpg|jpeg|png|gif|ico|svg|woff|woff2|ttf|eot)$ {
        root /var/www/your_app/public; # Adjust path to your static assets
        expires 365d;
        add_header Cache-Control "public";
        access_log off;
    }

    # Proxy requests to your application server (Gunicorn/FPM)
    location / {
        proxy_pass http://your_app_backend; # Defined in upstream block
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

# Define your upstream application servers
upstream your_app_backend {
    # For Gunicorn (Python)
    # server 127.0.0.1:8000;
    # server 127.0.0.1:8001;

    # For PHP-FPM
    server unix:/var/run/php/php7.4-fpm.sock; # Adjust PHP version and path
    # server unix:/var/run/php/php8.0-fpm.sock;
}

Gunicorn Tuning for Python Applications

Gunicorn (Green Unicorn) is a popular WSGI HTTP Server for Python. Its performance is heavily influenced by the number of worker processes, worker class, and thread settings. For I/O-bound applications, the gevent worker class is often preferred due to its non-blocking nature.

Worker Processes and Threads

The number of worker processes is typically set to (2 * number_of_cores) + 1 as a starting point. For gevent workers, threads are not directly applicable in the same way as with synchronous workers, as gevent uses green threads. If using synchronous workers, you might configure threads per worker.

Gunicorn Command Line Arguments

# Example for a Python application using gevent workers
# Assuming your application's WSGI entry point is 'your_app.wsgi:application'
# And you have 4 CPU cores on your EC2 instance

gunicorn --workers 9 \
         --worker-class gevent \
         --bind 0.0.0.0:8000 \
         --timeout 120 \
         --log-level info \
         --access-logfile /var/log/gunicorn/access.log \
         --error-logfile /var/log/gunicorn/error.log \
         your_app.wsgi:application

Explanation:

--workers 9: (2 * 4 cores) + 1. Adjust based on your instance type and application’s I/O characteristics.
--worker-class gevent: Utilizes green threads for efficient handling of concurrent I/O operations.
--bind 0.0.0.0:8000: Listens on all network interfaces on port 8000. Nginx will proxy to this.
--timeout 120: Sets the worker timeout to 120 seconds. Crucial for long-running requests.
--log-level info: Configures logging verbosity.
--access-logfile, --error-logfile: Specifies log file locations. Ensure these directories exist and have correct permissions.

Gevent Worker Tuning

For gevent workers, the number of green threads per worker is implicitly managed by the gevent library itself. You don’t typically configure a specific thread count for gevent workers in Gunicorn. The key is to ensure your application code is written to be gevent-compatible (e.g., using gevent-patched libraries for network I/O).

PHP-FPM Tuning for PHP Applications

PHP-FPM (FastCGI Process Manager) is the standard way to run PHP applications with Nginx. Its performance is governed by the number of child processes, how they are managed (static, dynamic, ondemand), and their resource limits.

Process Management and Pool Configuration

PHP-FPM pools are configured in /etc/php/[version]/fpm/pool.d/www.conf (or a custom pool name). The pm (process manager) setting is critical.

Example `www.conf` Snippet

; /etc/php/7.4/fpm/pool.d/www.conf (adjust PHP version as needed)

[www]
user = www-data
group = www-data
listen = /var/run/php/php7.4-fpm.sock ; Match this in Nginx upstream

; Process Manager settings
pm = dynamic
pm.max_children = 50       ; Max number of children at any one time
pm.start_servers = 5       ; Number of children when FPM starts
pm.min_spare_servers = 2   ; Min number of idle/spare children
pm.max_spare_servers = 10  ; Max number of idle/spare children
pm.max_requests = 500      ; Max requests a child process should execute before respawning

; Other important settings
request_terminate_timeout = 120s ; Corresponds to Gunicorn's timeout
; pm.process_idle_timeout = 10s  ; For pm = ondemand

; Memory limits
; php_admin_value[memory_limit] = 256M
; php_admin_value[max_execution_time] = 120

Explanation:

pm = dynamic: The process manager dynamically adjusts the number of child processes based on load. Other options are static (fixed number) and ondemand (spawns processes only when requests arrive). dynamic is often a good balance.
pm.max_children: This is a crucial setting. It should be calculated based on available memory and the memory footprint of your PHP processes. Too high, and you risk OOM errors; too low, and you’ll have request queues. A common starting point is (Total RAM - RAM for OS/Nginx/DB) / Average PHP Process Memory.
pm.max_requests: Setting this to a reasonable number helps prevent memory leaks from accumulating over time by respawning child processes after a certain number of requests.
request_terminate_timeout: Should align with Nginx and application server timeouts to prevent premature termination.

Tuning `pm.max_children`

This is often the most challenging parameter to tune. Monitor your server’s memory usage closely. Use tools like htop or AWS CloudWatch metrics (e.g., MemoryUtilization). If you see high swap usage or OOM killer activity, pm.max_children is too high. If requests are consistently slow or timing out, and CPU is not maxed out, it might be too low, leading to a backlog of requests waiting for available PHP-FPM workers.

PostgreSQL Performance Tuning on AWS RDS

For a robust Shopify-like platform, PostgreSQL is an excellent choice. AWS RDS simplifies database management, but tuning is still essential. We’ll focus on key PostgreSQL parameters and AWS-specific considerations.

RDS Parameter Groups

PostgreSQL configuration parameters are managed via Parameter Groups in AWS RDS. You’ll need to create a custom parameter group based on the default one for your instance class and PostgreSQL version.

Key Parameters to Tune

# Example parameters in a custom RDS PostgreSQL Parameter Group

# Memory Management
shared_buffers = 25% of instance RAM  ; e.g., for a db.r5.large (15.5 GiB RAM), set to ~3.8GB
effective_cache_size = 75% of instance RAM ; Allows planner to assume OS cache is available

# WAL (Write-Ahead Logging)
wal_buffers = 16MB
wal_writer_delay = 200ms
wal_checkpoint_timeout = 30min
wal_checkpoint_completion_target = 0.9

# Autovacuum
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 15s
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50

# Connection Pooling (if not using external pooler like PgBouncer)
max_connections = 100 ; Adjust based on application needs and instance RAM

# Query Planning and Execution
random_page_cost = 1.1 ; Lower for SSDs (default is 4.0)
seq_page_cost = 1.0
work_mem = 16MB ; Per operation memory. Tune based on complex queries.
maintenance_work_mem = 256MB ; For VACUUM, CREATE INDEX, etc.
max_worker_processes = 8 ; Should be >= autovacuum_max_workers and match CPU cores for parallel queries
max_parallel_workers = 4 ; Number of workers for parallel queries
max_parallel_workers_per_gather = 2 ; Max workers per Gather node

# Logging (for debugging/monitoring)
log_min_duration_statement = 1000 ; Log queries longer than 1 second
log_statement = 'ddl' ; Log DDL statements
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H-%M-%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB

Important Notes:

Instance RAM: The percentages for shared_buffers and effective_cache_size are general guidelines. Always monitor your instance’s memory usage. For smaller instances, shared_buffers might be closer to 25% of RAM, while larger instances can afford to use a bit less to leave more for the OS cache.
SSDs: If your RDS instance uses SSDs (which most do), lowering random_page_cost to 1.1 is beneficial as SSDs have much faster random I/O than HDDs.
Autovacuum: Ensure autovacuum is enabled and tuned. It’s crucial for reclaiming space from dead tuples and preventing table bloat, especially in high-transaction environments. Adjust thresholds and naptime based on your workload.
max_connections: This directly consumes memory. Set it conservatively and consider using a connection pooler like PgBouncer if your application opens many short-lived connections.
Parallel Queries: max_worker_processes and max_parallel_workers are important for leveraging multi-core CPUs for complex analytical queries. Tune them based on your instance’s CPU count.
Parameter Group Changes: After modifying a custom parameter group, you need to reboot your RDS instance for most of these parameters to take effect.

Monitoring and Analysis

AWS RDS provides CloudWatch metrics for CPU utilization, memory, disk I/O, network traffic, and database connections. Additionally, leverage PostgreSQL’s built-in tools and extensions:

Essential SQL Queries and Tools

-- Check for long-running queries
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE state != 'idle' AND query NOT LIKE '%pg_stat_activity%'
ORDER BY query_start;

-- Analyze table bloat
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup,
    (n_dead_tup::float / n_live_tup::float) AS dead_tup_ratio,
    pg_size_pretty(pg_table_size(oid)) AS table_size,
    pg_size_pretty(pg_total_relation_size(oid)) AS total_size
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000 AND (n_dead_tup::float / n_live_tup::float) > 0.2
ORDER BY dead_tup_ratio DESC;

-- Check autovacuum status
SELECT
    relname,
    last_vacuum,
    last_autovacuum,
    vacuum_count,
    autovacuum_count
FROM pg_stat_user_tables
ORDER BY autovacuum_count DESC;

-- Monitor cache hit ratio
SELECT
    sum(blks_hit) AS hits,
    sum(blks_read) AS reads,
    sum(blks_hit) / (sum(blks_hit) + sum(blks_read)) AS ratio
FROM pg_stat_database
WHERE datname = current_database();

A cache hit ratio above 95% is generally considered good. If it’s lower, consider increasing shared_buffers or effective_cache_size, or optimizing queries to be more cache-friendly.

Putting It All Together: AWS Infrastructure

On AWS, this setup typically involves:

EC2 Instance(s) for Nginx: A small to medium EC2 instance (e.g., t3.medium, m5.large) configured with the Nginx settings described. Consider using Auto Scaling Groups for high availability and scalability.
EC2 Instance(s) for Application Server (Gunicorn/PHP-FPM): These instances run your application code. They can be the same instances as Nginx if co-located, or separate instances behind Nginx. Use Auto Scaling Groups here as well.
AWS RDS for PostgreSQL: A managed PostgreSQL instance. Choose an instance class that balances compute, memory, and I/O needs. For I/O-intensive workloads, consider instances with local NVMe SSDs or provisioned IOPS.
Elastic Load Balancer (ELB): If using multiple Nginx instances, an ELB (ALB or NLB) can distribute traffic.
Security Groups: Properly configure security groups to allow traffic only on necessary ports (e.g., 80, 443 for Nginx, 5432 for RDS from application servers).
IAM Roles: Grant necessary permissions for EC2 instances to interact with other AWS services (e.g., CloudWatch for logging/monitoring).

This comprehensive tuning approach, from the edge Nginx proxy through your application runtime (Gunicorn/FPM) to the database layer (PostgreSQL), is essential for building a performant and scalable platform capable of handling significant traffic, akin to a large e-commerce site like Shopify.