The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and PostgreSQL on AWS for PHP
Nginx Tuning for High-Traffic PHP Applications
Optimizing Nginx is crucial for serving static assets efficiently and acting as a robust reverse proxy for your PHP application server. On AWS, leveraging EC2 instances with appropriate EBS volumes and considering network configurations are foundational. This section focuses on Nginx worker processes, connection limits, caching, and buffering.
Worker Processes and Connections
The `worker_processes` directive controls how many worker processes Nginx will spawn. A common recommendation is to set this to the number of CPU cores available on your server. For dynamic scaling on AWS, you might set this to `auto` to let Nginx decide based on available cores.
The `worker_connections` directive sets the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be `worker_processes * worker_connections`. Ensure this value is high enough to accommodate your expected peak load, but not so high that it exhausts system resources (like file descriptors).
Nginx Configuration Snippet
Here’s a typical `nginx.conf` snippet for a high-traffic PHP setup. Adjust `worker_processes` and `worker_connections` based on your EC2 instance type and load testing results.
user www-data;
worker_processes auto; # Or set to the number of CPU cores
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 4096; # Adjust based on load testing and system limits
multi_accept on;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
server_tokens off; # Hide Nginx version for security
# Gzip compression
gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
# Buffering and timeouts for proxying to PHP-FPM/Gunicorn
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_buffer_size 128k;
proxy_buffers 4 256k;
proxy_busy_buffers_size 256k;
# Include other configurations
include /etc/nginx/mime.types;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
Tuning PHP-FPM (or Gunicorn for Python/Node.js)
For PHP applications, PHP-FPM (FastCGI Process Manager) is the de facto standard. For Python/Node.js, Gunicorn is a popular choice. The core principle is to configure the process manager to handle concurrent requests efficiently without overwhelming the server or causing excessive context switching.
PHP-FPM Configuration (`php-fpm.conf` or pool configuration)
The `pm` (process manager) setting is critical. `dynamic` is often a good starting point, but `ondemand` can save resources if traffic is spiky, while `static` offers the most predictable performance under constant high load. Adjust `pm.max_children`, `pm.start_servers`, `pm.min_spare_servers`, and `pm.max_spare_servers` based on your server’s RAM and CPU. A common mistake is setting `pm.max_children` too high, leading to OOM killer invocation.
PHP-FPM Pool Configuration Example
; Example for /etc/php/7.4/fpm/pool.d/www.conf [www] user = www-data group = www-data listen = /run/php/php7.4-fpm.sock listen.owner = www-data listen.group = www-data listen.mode = 0660 ; Process Manager settings pm = dynamic pm.max_children = 100 ; Adjust based on RAM. (Total RAM - OS/Nginx - PHP overhead) / Average PHP process size pm.start_servers = 10 pm.min_spare_servers = 5 pm.max_spare_servers = 20 pm.process_idle_timeout = 10s pm.max_requests = 500 ; Restart child processes after this many requests ; Request handling request_terminate_timeout = 60s ; Max execution time for a script request_slowlog_timeout = 10s ; Log slow requests slowlog = /var/log/php/php7.4-fpm.slow.log ; Other settings catch_workers_output = yes ; php_admin_value[memory_limit] = 256M ; php_admin_flag[display_errors] = off
Tuning `pm.max_children`: This is the most critical parameter. A rough estimate for `max_children` can be calculated as: `(Total RAM – OS/Nginx RAM – Buffer/Cache RAM) / Average PHP Process Size`. You can monitor average PHP process size using tools like htop or ps aux. Start conservatively and increase based on load testing and monitoring.
`pm.max_requests`: Setting this to a reasonable number (e.g., 500-1000) helps prevent memory leaks from accumulating over time by periodically restarting child processes.
Gunicorn Configuration (Python)
Gunicorn’s worker types (sync, gevent, eventlet) and worker count significantly impact performance. For most synchronous Python applications, the `sync` worker is standard. `gevent` or `eventlet` are better for I/O-bound applications that can benefit from asynchronous handling.
Gunicorn Command Line Example
# Example command to run Gunicorn # For a Django app: # gunicorn myproject.wsgi:application --bind 0.0.0.0:8000 --workers 3 --threads 2 --worker-class sync --timeout 120 --log-level info --access-logfile - --error-logfile - # Explanation: # --workers: Number of worker processes. A common starting point is (2 * number of CPU cores) + 1. # --threads: Number of threads per worker (only applicable for 'sync' worker class). # --worker-class: 'sync' (default), 'gevent', 'eventlet'. # --timeout: Worker timeout in seconds. # --log-level: Logging verbosity. # --access-logfile, --error-logfile: Where to send logs. '-' means stdout/stderr. # For a Flask app: # gunicorn -w 4 -k gevent -b 127.0.0.1:5000 app:app --worker-connections 1000
Tuning Workers and Threads: The optimal number of workers and threads depends heavily on whether your application is CPU-bound or I/O-bound, and the available CPU cores. For CPU-bound tasks, more workers might not help beyond the number of cores. For I/O-bound tasks, using `gevent` or `eventlet` with a higher number of worker connections per worker can be beneficial.
PostgreSQL Tuning on AWS RDS/EC2
Database performance is often the bottleneck. Tuning PostgreSQL involves adjusting memory parameters, connection pooling, and query optimization. On AWS, RDS offers managed parameters, while EC2 requires manual configuration.
Key PostgreSQL Parameters
These parameters are typically set in `postgresql.conf`. For RDS, you’ll use Parameter Groups.
# Shared Memory and Buffering shared_buffers = 25% of total RAM ; e.g., 8GB for a 32GB instance effective_cache_size = 50-75% of total RAM ; Helps the planner estimate OS cache usage # WAL (Write-Ahead Logging) wal_buffers = 16MB wal_writer_delay = 200ms min_wal_size = 1GB max_wal_size = 4GB ; Adjust based on write load and disk space # Checkpointing checkpoint_timeout = 5min max_wal_senders = 10 ; If using replication wal_keep_segments = 0 ; Deprecated, use max_wal_size # Autovacuum autovacuum = on autovacuum_max_workers = 3 autovacuum_naptime = 15s autovacuum_vacuum_threshold = 50 autovacuum_analyze_threshold = 50 # Connection and Resource Management max_connections = 100 ; Adjust based on application needs and RAM shared_preload_libraries = 'pg_stat_statements' ; Essential for query analysis # Query Planning random_page_cost = 1.1 ; Default is 4.0, lower for SSDs seq_page_cost = 1.0
`shared_buffers`: This is the most important memory parameter. It’s the memory PostgreSQL uses for caching data. Setting it to 25% of system RAM is a common starting point. Avoid setting it too high, as the OS also needs RAM for its file system cache.
`effective_cache_size`: This tells the query planner how much memory is available for disk caching by both PostgreSQL (`shared_buffers`) and the operating system. Setting it to 50-75% of total RAM is a good heuristic.
`max_connections`: Each connection consumes memory. Set this based on your application’s concurrency requirements and available RAM. Consider using a connection pooler like PgBouncer if `max_connections` needs to be very high.
Connection Pooling with PgBouncer
For applications with high connection churn (e.g., many short-lived connections), a connection pooler like PgBouncer can drastically reduce the overhead on PostgreSQL. It maintains a pool of persistent connections to the database and allows clients to connect to PgBouncer, which then hands out connections from its pool.
PgBouncer Configuration (`pgbouncer.ini`)
[databases] mydb = host=your_rds_endpoint.rds.amazonaws.com port=5432 dbname=your_db_name [pgbouncer] ; Listen address and port listen_addr = 0.0.0.0 listen_port = 6432 ; Authentication method (e.g., md5, trust, cert) auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt ; Pool mode: session, transaction, statement ; 'transaction' is often a good balance for many apps pool_mode = transaction ; Maximum number of clients per database max_client_conn = 1000 ; Maximum number of server connections per database default_pool_size = 20 ; Connection timeout server_idle_timeout = 60 ; Logging logfile = /var/log/pgbouncer/pgbouncer.log pidfile = /var/run/pgbouncer/pgbouncer.pid
`userlist.txt`: This file contains credentials for connecting to PostgreSQL. Format: ` “database” “username” “password_hash” `.
Your application should then connect to `localhost:6432` (or wherever PgBouncer is listening) instead of directly to the PostgreSQL server.
Query Optimization and `pg_stat_statements`
Even with perfect configuration, inefficient queries will cripple performance. The `pg_stat_statements` extension is invaluable for identifying slow or frequently executed queries. Ensure it’s loaded in `postgresql.conf` (`shared_preload_libraries`) and then enabled in your database:
-- Connect to your database and run: CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
You can then query it:
SELECT
query,
calls,
total_exec_time,
rows,
mean_exec_time,
stddev_exec_time
FROM
pg_stat_statements
ORDER BY
total_exec_time DESC
LIMIT 20;
Analyze the output to find queries with high `total_exec_time` or `calls` and optimize them using `EXPLAIN ANALYZE`, adding appropriate indexes, or rewriting the query logic.
AWS Specific Considerations
Instance Sizing: Choose EC2 instance types that balance CPU, RAM, and Network I/O. For database workloads, memory-optimized instances (like `r` series) are often suitable. For web servers, compute-optimized (`c` series) or general-purpose (`m` series) might be better.
EBS Volumes: For PostgreSQL, use Provisioned IOPS SSD (io1/io2) volumes for predictable performance, especially for write-heavy workloads. For Nginx serving static assets, General Purpose SSD (gp3) offers a good balance of cost and performance, with configurable IOPS and throughput.
Network Bandwidth: Ensure your instance type provides sufficient network bandwidth. For high-traffic applications, consider instances with Enhanced Networking or placement groups for low-latency communication between instances.
Monitoring: Leverage AWS CloudWatch for instance metrics (CPU Utilization, Network In/Out, Disk Read/Write Ops), RDS metrics (Database Connections, CPU Utilization, Read/Write Latency), and custom metrics. Set up Alarms for critical thresholds.