The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and PostgreSQL on AWS for PHP

Nginx Tuning for High-Traffic PHP Applications

Optimizing Nginx is crucial for serving static assets efficiently and acting as a robust reverse proxy for your PHP application server. On AWS, leveraging EC2 instances with appropriate EBS volumes and considering network configurations are foundational. This section focuses on Nginx worker processes, connection limits, caching, and buffering.

Worker Processes and Connections

The `worker_processes` directive controls how many worker processes Nginx will spawn. A common recommendation is to set this to the number of CPU cores available on your server. For dynamic scaling on AWS, you might set this to `auto` to let Nginx decide based on available cores.

The `worker_connections` directive sets the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be `worker_processes * worker_connections`. Ensure this value is high enough to accommodate your expected peak load, but not so high that it exhausts system resources (like file descriptors).

Nginx Configuration Snippet

Here’s a typical `nginx.conf` snippet for a high-traffic PHP setup. Adjust `worker_processes` and `worker_connections` based on your EC2 instance type and load testing results.

user www-data;
worker_processes auto; # Or set to the number of CPU cores
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 4096; # Adjust based on load testing and system limits
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    server_tokens off; # Hide Nginx version for security

    # Gzip compression
    gzip on;
    gzip_disable "msie6";
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Buffering and timeouts for proxying to PHP-FPM/Gunicorn
    proxy_connect_timeout 60s;
    proxy_send_timeout 60s;
    proxy_read_timeout 60s;
    proxy_buffer_size 128k;
    proxy_buffers 4 256k;
    proxy_busy_buffers_size 256k;

    # Include other configurations
    include /etc/nginx/mime.types;
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Tuning PHP-FPM (or Gunicorn for Python/Node.js)

For PHP applications, PHP-FPM (FastCGI Process Manager) is the de facto standard. For Python/Node.js, Gunicorn is a popular choice. The core principle is to configure the process manager to handle concurrent requests efficiently without overwhelming the server or causing excessive context switching.

PHP-FPM Configuration (`php-fpm.conf` or pool configuration)

The `pm` (process manager) setting is critical. `dynamic` is often a good starting point, but `ondemand` can save resources if traffic is spiky, while `static` offers the most predictable performance under constant high load. Adjust `pm.max_children`, `pm.start_servers`, `pm.min_spare_servers`, and `pm.max_spare_servers` based on your server’s RAM and CPU. A common mistake is setting `pm.max_children` too high, leading to OOM killer invocation.

PHP-FPM Pool Configuration Example

; Example for /etc/php/7.4/fpm/pool.d/www.conf

[www]
user = www-data
group = www-data
listen = /run/php/php7.4-fpm.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0660

; Process Manager settings
pm = dynamic
pm.max_children = 100       ; Adjust based on RAM. (Total RAM - OS/Nginx - PHP overhead) / Average PHP process size
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
pm.process_idle_timeout = 10s
pm.max_requests = 500       ; Restart child processes after this many requests

; Request handling
request_terminate_timeout = 60s ; Max execution time for a script
request_slowlog_timeout = 10s   ; Log slow requests
slowlog = /var/log/php/php7.4-fpm.slow.log

; Other settings
catch_workers_output = yes
; php_admin_value[memory_limit] = 256M
; php_admin_flag[display_errors] = off

Tuning `pm.max_children`: This is the most critical parameter. A rough estimate for `max_children` can be calculated as: `(Total RAM – OS/Nginx RAM – Buffer/Cache RAM) / Average PHP Process Size`. You can monitor average PHP process size using tools like htop or ps aux. Start conservatively and increase based on load testing and monitoring.

`pm.max_requests`: Setting this to a reasonable number (e.g., 500-1000) helps prevent memory leaks from accumulating over time by periodically restarting child processes.

Gunicorn Configuration (Python)

Gunicorn’s worker types (sync, gevent, eventlet) and worker count significantly impact performance. For most synchronous Python applications, the `sync` worker is standard. `gevent` or `eventlet` are better for I/O-bound applications that can benefit from asynchronous handling.

Gunicorn Command Line Example

# Example command to run Gunicorn
# For a Django app:
# gunicorn myproject.wsgi:application --bind 0.0.0.0:8000 --workers 3 --threads 2 --worker-class sync --timeout 120 --log-level info --access-logfile - --error-logfile -

# Explanation:
# --workers: Number of worker processes. A common starting point is (2 * number of CPU cores) + 1.
# --threads: Number of threads per worker (only applicable for 'sync' worker class).
# --worker-class: 'sync' (default), 'gevent', 'eventlet'.
# --timeout: Worker timeout in seconds.
# --log-level: Logging verbosity.
# --access-logfile, --error-logfile: Where to send logs. '-' means stdout/stderr.

# For a Flask app:
# gunicorn -w 4 -k gevent -b 127.0.0.1:5000 app:app --worker-connections 1000

Tuning Workers and Threads: The optimal number of workers and threads depends heavily on whether your application is CPU-bound or I/O-bound, and the available CPU cores. For CPU-bound tasks, more workers might not help beyond the number of cores. For I/O-bound tasks, using `gevent` or `eventlet` with a higher number of worker connections per worker can be beneficial.

PostgreSQL Tuning on AWS RDS/EC2

Database performance is often the bottleneck. Tuning PostgreSQL involves adjusting memory parameters, connection pooling, and query optimization. On AWS, RDS offers managed parameters, while EC2 requires manual configuration.

Key PostgreSQL Parameters

These parameters are typically set in `postgresql.conf`. For RDS, you’ll use Parameter Groups.

# Shared Memory and Buffering
shared_buffers = 25% of total RAM  ; e.g., 8GB for a 32GB instance
effective_cache_size = 50-75% of total RAM ; Helps the planner estimate OS cache usage

# WAL (Write-Ahead Logging)
wal_buffers = 16MB
wal_writer_delay = 200ms
min_wal_size = 1GB
max_wal_size = 4GB ; Adjust based on write load and disk space

# Checkpointing
checkpoint_timeout = 5min
max_wal_senders = 10 ; If using replication
wal_keep_segments = 0 ; Deprecated, use max_wal_size

# Autovacuum
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 15s
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50

# Connection and Resource Management
max_connections = 100 ; Adjust based on application needs and RAM
shared_preload_libraries = 'pg_stat_statements' ; Essential for query analysis

# Query Planning
random_page_cost = 1.1 ; Default is 4.0, lower for SSDs
seq_page_cost = 1.0

`shared_buffers`: This is the most important memory parameter. It’s the memory PostgreSQL uses for caching data. Setting it to 25% of system RAM is a common starting point. Avoid setting it too high, as the OS also needs RAM for its file system cache.

`effective_cache_size`: This tells the query planner how much memory is available for disk caching by both PostgreSQL (`shared_buffers`) and the operating system. Setting it to 50-75% of total RAM is a good heuristic.

`max_connections`: Each connection consumes memory. Set this based on your application’s concurrency requirements and available RAM. Consider using a connection pooler like PgBouncer if `max_connections` needs to be very high.

Connection Pooling with PgBouncer

For applications with high connection churn (e.g., many short-lived connections), a connection pooler like PgBouncer can drastically reduce the overhead on PostgreSQL. It maintains a pool of persistent connections to the database and allows clients to connect to PgBouncer, which then hands out connections from its pool.

PgBouncer Configuration (`pgbouncer.ini`)

[databases]
mydb = host=your_rds_endpoint.rds.amazonaws.com port=5432 dbname=your_db_name

[pgbouncer]
; Listen address and port
listen_addr = 0.0.0.0
listen_port = 6432

; Authentication method (e.g., md5, trust, cert)
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; Pool mode: session, transaction, statement
; 'transaction' is often a good balance for many apps
pool_mode = transaction

; Maximum number of clients per database
max_client_conn = 1000

; Maximum number of server connections per database
default_pool_size = 20

; Connection timeout
server_idle_timeout = 60

; Logging
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid

`userlist.txt`: This file contains credentials for connecting to PostgreSQL. Format: ` “database” “username” “password_hash” `.

Your application should then connect to `localhost:6432` (or wherever PgBouncer is listening) instead of directly to the PostgreSQL server.

Query Optimization and `pg_stat_statements`

Even with perfect configuration, inefficient queries will cripple performance. The `pg_stat_statements` extension is invaluable for identifying slow or frequently executed queries. Ensure it’s loaded in `postgresql.conf` (`shared_preload_libraries`) and then enabled in your database:

-- Connect to your database and run:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

You can then query it:

SELECT
    query,
    calls,
    total_exec_time,
    rows,
    mean_exec_time,
    stddev_exec_time
FROM
    pg_stat_statements
ORDER BY
    total_exec_time DESC
LIMIT 20;

Analyze the output to find queries with high `total_exec_time` or `calls` and optimize them using `EXPLAIN ANALYZE`, adding appropriate indexes, or rewriting the query logic.

AWS Specific Considerations

Instance Sizing: Choose EC2 instance types that balance CPU, RAM, and Network I/O. For database workloads, memory-optimized instances (like `r` series) are often suitable. For web servers, compute-optimized (`c` series) or general-purpose (`m` series) might be better.

EBS Volumes: For PostgreSQL, use Provisioned IOPS SSD (io1/io2) volumes for predictable performance, especially for write-heavy workloads. For Nginx serving static assets, General Purpose SSD (gp3) offers a good balance of cost and performance, with configurable IOPS and throughput.

Network Bandwidth: Ensure your instance type provides sufficient network bandwidth. For high-traffic applications, consider instances with Enhanced Networking or placement groups for low-latency communication between instances.

Monitoring: Leverage AWS CloudWatch for instance metrics (CPU Utilization, Network In/Out, Disk Read/Write Ops), RDS metrics (Database Connections, CPU Utilization, Read/Write Latency), and custom metrics. Set up Alarms for critical thresholds.