The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and MongoDB on AWS for PHP
Nginx Tuning for High-Traffic PHP Applications
Optimizing Nginx is paramount for any high-traffic PHP application. We’ll focus on key directives that directly impact performance and resource utilization, particularly when serving dynamic content proxied to Gunicorn (for Python-based frameworks often used alongside PHP) or PHP-FPM.
Worker Processes and Connections
The worker_processes directive dictates how many worker processes Nginx will spawn. Setting this to auto is generally recommended, allowing Nginx to detect the number of CPU cores and utilize them efficiently. The worker_connections directive limits the number of simultaneous connections a single worker process can handle. This value, combined with worker_processes, determines the maximum concurrent connections Nginx can manage. A common starting point is 1024, but this should be tuned based on your application’s concurrency needs and server’s RAM.
Keepalive Connections
keepalive_timeout and keepalive_requests are crucial for reducing the overhead of establishing new TCP connections for each HTTP request. keepalive_timeout specifies how long an idle keep-alive connection will remain open. A value between 60 and 120 seconds is often a good balance. keepalive_requests limits the number of requests that can be made over a single keep-alive connection. Setting this to a higher value, like 1000, can further reduce connection overhead.
Buffering and Timeouts
Nginx uses buffers to handle request and response data. Tuning these can prevent memory exhaustion and improve throughput. client_body_buffer_size, client_header_buffer_size, and large_client_header_buffers control the size of buffers for client requests. For typical web applications, default values are often sufficient, but for applications handling large file uploads, these might need adjustment. proxy_connect_timeout, proxy_send_timeout, and proxy_read_timeout are critical when Nginx acts as a reverse proxy. These define how long Nginx will wait for a connection to the upstream server, how long it will wait to send a request to the upstream, and how long it will wait for a response from the upstream, respectively. Setting these too low can lead to premature timeouts for slow upstream responses, while setting them too high can tie up worker processes. Values between 30-60 seconds are common starting points.
Gzip Compression
Enabling Gzip compression significantly reduces the bandwidth required to transfer web content, leading to faster load times. Directives like gzip, gzip_vary, gzip_proxied, gzip_types, and gzip_min_length are essential. Ensure gzip is on, gzip_vary adds the Vary: Accept-Encoding header, and gzip_proxied enables compression for proxied responses. gzip_types should include common MIME types, and gzip_min_length prevents compressing very small files where the overhead might outweigh the benefit.
Example Nginx Configuration Snippet
Here’s a sample Nginx configuration snippet incorporating these tuning parameters for a PHP application proxied to PHP-FPM:
worker_processes auto;
worker_connections 4096; # Adjust based on server RAM and expected load
events {
multi_accept on;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 1000;
# Buffering
client_max_body_size 100m; # Adjust for large file uploads
client_body_buffer_size 128k;
client_header_buffer_size 128k;
large_client_header_buffers 4 128k;
# Gzip Compression
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6; # Compression level (1-9)
gzip_min_length 256; # Minimum response length to compress
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;
# Proxy settings for PHP-FPM
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Adjust to your PHP-FPM socket/address
fastcgi_read_timeout 300; # Longer timeout for potentially slow PHP scripts
fastcgi_connect_timeout 30;
fastcgi_send_timeout 30;
}
# Other configurations (e.g., server blocks, static file serving)
# ...
}
PHP-FPM Tuning for Performance
PHP-FPM (FastCGI Process Manager) is the de facto standard for running PHP applications. Its configuration directly impacts how PHP requests are handled and the resources consumed. The primary configuration file is typically php-fpm.conf or files within php-fpm.d/.
Process Manager Settings
The pm (Process Manager) setting is critical. It can be set to static, dynamic, or ondemand.
- static: A fixed number of child processes are always kept running. This offers consistent performance but can be wasteful of resources if traffic is variable.
- dynamic: The number of child processes varies between
pm.min_spare_serversandpm.max_childrenbased on traffic. This is a good balance for most applications. - ondemand: Processes are spawned only when a request comes in and are killed after a period of inactivity. This saves resources but can introduce latency for the first request after an idle period.
For dynamic, tuning pm.max_children, pm.start_servers, pm.min_spare_servers, and pm.max_spare_servers is essential. pm.max_children is the most important; setting it too high can lead to out-of-memory errors. A common rule of thumb is to set it based on available RAM, considering the memory footprint of a single PHP process. For example, if each PHP process consumes 50MB of RAM and you have 4GB of RAM, you might aim for a maximum of around 80 children (4096MB / 50MB ≈ 81), leaving room for the OS and other services.
Request and Execution Timeouts
request_terminate_timeout defines the maximum time a script can run before being terminated. This prevents runaway scripts from hogging resources. For long-running tasks, consider offloading them to background workers. max_execution_time in php.ini also plays a role, but PHP-FPM’s directive is often more relevant for process management.
Example PHP-FPM Pool Configuration (dynamic)
This example is for a typical dynamic process manager configuration. Adjust values based on your server’s resources and application needs.
[www]
user = www-data
group = www-data
listen = /var/run/php/php7.4-fpm.sock # Or a TCP/IP address like 127.0.0.1:9000
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
pm = dynamic
pm.max_children = 100 # Adjust based on RAM and process size
pm.start_servers = 20 # Initial number of children
pm.min_spare_servers = 10 # Minimum number of idle children
pm.max_spare_servers = 50 # Maximum number of idle children
pm.process_idle_timeout = 10s ; Value between text and unit (e.g. "10s")
request_terminate_timeout = 120s # Maximum script execution time
request_slowlog_timeout = 30s # Log scripts taking longer than this
pm.max_requests = 500 # Restart a child process after this many requests
# Helps prevent memory leaks over time
Gunicorn Tuning for Python WSGI Applications
When serving Python WSGI applications, Gunicorn is a popular choice. Its configuration focuses on worker processes, threads, and timeouts.
Worker Types and Counts
Gunicorn supports several worker types:
- sync: The default and simplest worker type. Each worker handles one request at a time.
- gevent/eventlet: Asynchronous workers that use green threads to handle multiple requests concurrently within a single process. These are generally more performant for I/O-bound applications.
- threads: Uses Python’s native threading. Less common for web servers due to the Global Interpreter Lock (GIL).
The number of workers is typically set using the -w or --workers flag. A common recommendation is (2 * number_of_cpu_cores) + 1. For asynchronous workers (gevent/eventlet), you might use more workers per core, as they are less CPU-bound per request.
Worker Timeout and Threads
--timeout (or -t) specifies the number of seconds Gunicorn will wait for a worker to respond to a request before considering it timed out. This should be set higher than your application’s longest expected request processing time. If using the gthread worker type, --threads controls the number of threads per worker process.
Example Gunicorn Command Line Configuration
This example uses the gevent worker type, suitable for I/O-bound applications. Adjust worker count based on your server’s CPU cores.
gunicorn --workers 4 \
--worker-class gevent \
--bind 0.0.0.0:8000 \
--timeout 120 \
--graceful-timeout 120 \
--log-level info \
your_project.wsgi:application
MongoDB Performance Tuning on AWS
Optimizing MongoDB, especially on AWS, involves both instance selection and database-level configurations. For managed services like Amazon DocumentDB or MongoDB Atlas, many underlying OS-level tunings are handled for you, but application-level queries and schema design remain critical.
Instance Sizing and EBS Volumes
Choosing the right EC2 instance type (or DocumentDB/Atlas equivalent) is crucial. Instances with higher network bandwidth and more vCPUs are generally better for database workloads. For EBS volumes, use gp3 or io1/io2. gp3 offers consistent baseline performance and allows independent scaling of IOPS and throughput, making it cost-effective. io1/io2 provide guaranteed IOPS but are more expensive. Ensure your EBS volume is provisioned with sufficient IOPS and throughput to meet your application’s demands. For MongoDB, data is read from disk frequently, so I/O performance is paramount.
MongoDB Configuration File (`mongod.conf`)
Key parameters in mongod.conf (or equivalent for managed services) include:
storage.wiredTiger.engineConfig.cacheSizeGB: This is arguably the most important setting. It controls the size of the WiredTiger cache. A common recommendation is to allocate 50% of the system’s RAM to the WiredTiger cache, leaving enough for the OS and other processes. For example, on a 32GB RAM instance, you might set this to 16GB.storage.journal.enabled: Should almost always betruefor durability.net.bindIp: Configure this to bind to the appropriate network interfaces for your AWS VPC.operationProfiling.mode: Set toallorslowOpto enable profiling for identifying slow queries.
Indexing Strategy
Proper indexing is fundamental to MongoDB performance. Analyze slow queries using the profiler or explain() output to identify missing or inefficient indexes. Avoid over-indexing, as indexes consume memory and slow down write operations.
Query Optimization
Use the explain() method on your queries to understand how MongoDB executes them. Look for operations that perform collection scans (COLLSCAN) or lack an index. Rewrite queries to leverage existing indexes effectively. Consider projection (returning only necessary fields) to reduce network I/O and memory usage.
Example MongoDB Configuration Snippet
This snippet shows relevant settings in mongod.conf. Note that managed services might abstract some of these.
systemLog:
destination: file
path: /var/log/mongodb/mongod.log
logAppend: true
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
wiredTiger:
engineConfig:
cacheSizeGB: 16 # Adjust based on instance RAM (e.g., 50% of total RAM)
# network interfaces
net:
port: 27017
bindIp: 127.0.0.1,10.0.1.10 # Replace with your VPC private IP or 0.0.0.0 for all interfaces
# security:
# authorization: enabled
# operation profiling
operationProfiling:
mode: slowOp # or 'all' for more detailed profiling
slowOpThresholdMs: 100 # Log operations slower than 100ms
# Sharding (if applicable)
# sharding:
# clusterRole: configsvr
# configDB: mycluster.local:27019
Monitoring and Iteration
Performance tuning is an iterative process. Continuously monitor your Nginx, PHP-FPM/Gunicorn, and MongoDB metrics using tools like CloudWatch, Prometheus/Grafana, or built-in monitoring dashboards. Pay attention to CPU utilization, memory usage, network I/O, disk I/O, request latency, and error rates. Use this data to identify bottlenecks and further refine your configurations.