The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and MongoDB on AWS for PHP

Nginx Tuning for High-Traffic PHP Applications

Optimizing Nginx is paramount for any high-traffic PHP application. We’ll focus on key directives that directly impact performance and resource utilization, particularly when serving dynamic content proxied to Gunicorn (for Python-based frameworks often used alongside PHP) or PHP-FPM.

Worker Processes and Connections

The worker_processes directive dictates how many worker processes Nginx will spawn. Setting this to auto is generally recommended, allowing Nginx to detect the number of CPU cores and utilize them efficiently. The worker_connections directive limits the number of simultaneous connections a single worker process can handle. This value, combined with worker_processes, determines the maximum concurrent connections Nginx can manage. A common starting point is 1024, but this should be tuned based on your application’s concurrency needs and server’s RAM.

Keepalive Connections

keepalive_timeout and keepalive_requests are crucial for reducing the overhead of establishing new TCP connections for each HTTP request. keepalive_timeout specifies how long an idle keep-alive connection will remain open. A value between 60 and 120 seconds is often a good balance. keepalive_requests limits the number of requests that can be made over a single keep-alive connection. Setting this to a higher value, like 1000, can further reduce connection overhead.

Buffering and Timeouts

Nginx uses buffers to handle request and response data. Tuning these can prevent memory exhaustion and improve throughput. client_body_buffer_size, client_header_buffer_size, and large_client_header_buffers control the size of buffers for client requests. For typical web applications, default values are often sufficient, but for applications handling large file uploads, these might need adjustment. proxy_connect_timeout, proxy_send_timeout, and proxy_read_timeout are critical when Nginx acts as a reverse proxy. These define how long Nginx will wait for a connection to the upstream server, how long it will wait to send a request to the upstream, and how long it will wait for a response from the upstream, respectively. Setting these too low can lead to premature timeouts for slow upstream responses, while setting them too high can tie up worker processes. Values between 30-60 seconds are common starting points.

Gzip Compression

Enabling Gzip compression significantly reduces the bandwidth required to transfer web content, leading to faster load times. Directives like gzip, gzip_vary, gzip_proxied, gzip_types, and gzip_min_length are essential. Ensure gzip is on, gzip_vary adds the Vary: Accept-Encoding header, and gzip_proxied enables compression for proxied responses. gzip_types should include common MIME types, and gzip_min_length prevents compressing very small files where the overhead might outweigh the benefit.

Example Nginx Configuration Snippet

Here’s a sample Nginx configuration snippet incorporating these tuning parameters for a PHP application proxied to PHP-FPM:

worker_processes auto;
worker_connections 4096; # Adjust based on server RAM and expected load

events {
    multi_accept on;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;

    keepalive_timeout 65;
    keepalive_requests 1000;

    # Buffering
    client_max_body_size 100m; # Adjust for large file uploads
    client_body_buffer_size 128k;
    client_header_buffer_size 128k;
    large_client_header_buffers 4 128k;

    # Gzip Compression
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6; # Compression level (1-9)
    gzip_min_length 256; # Minimum response length to compress
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;

    # Proxy settings for PHP-FPM
    location ~ \.php$ {
        include snippets/fastcgi-php.conf;
        fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Adjust to your PHP-FPM socket/address
        fastcgi_read_timeout 300; # Longer timeout for potentially slow PHP scripts
        fastcgi_connect_timeout 30;
        fastcgi_send_timeout 30;
    }

    # Other configurations (e.g., server blocks, static file serving)
    # ...
}

PHP-FPM Tuning for Performance

PHP-FPM (FastCGI Process Manager) is the de facto standard for running PHP applications. Its configuration directly impacts how PHP requests are handled and the resources consumed. The primary configuration file is typically php-fpm.conf or files within php-fpm.d/.

Process Manager Settings

The pm (Process Manager) setting is critical. It can be set to static, dynamic, or ondemand.

static: A fixed number of child processes are always kept running. This offers consistent performance but can be wasteful of resources if traffic is variable.
dynamic: The number of child processes varies between pm.min_spare_servers and pm.max_children based on traffic. This is a good balance for most applications.
ondemand: Processes are spawned only when a request comes in and are killed after a period of inactivity. This saves resources but can introduce latency for the first request after an idle period.

For dynamic, tuning pm.max_children, pm.start_servers, pm.min_spare_servers, and pm.max_spare_servers is essential. pm.max_children is the most important; setting it too high can lead to out-of-memory errors. A common rule of thumb is to set it based on available RAM, considering the memory footprint of a single PHP process. For example, if each PHP process consumes 50MB of RAM and you have 4GB of RAM, you might aim for a maximum of around 80 children (4096MB / 50MB ≈ 81), leaving room for the OS and other services.

Request and Execution Timeouts

request_terminate_timeout defines the maximum time a script can run before being terminated. This prevents runaway scripts from hogging resources. For long-running tasks, consider offloading them to background workers. max_execution_time in php.ini also plays a role, but PHP-FPM’s directive is often more relevant for process management.

Example PHP-FPM Pool Configuration (dynamic)

This example is for a typical dynamic process manager configuration. Adjust values based on your server’s resources and application needs.

[www]
user = www-data
group = www-data
listen = /var/run/php/php7.4-fpm.sock # Or a TCP/IP address like 127.0.0.1:9000
listen.owner = www-data
listen.group = www-data
listen.mode = 0660

pm = dynamic
pm.max_children = 100       # Adjust based on RAM and process size
pm.start_servers = 20       # Initial number of children
pm.min_spare_servers = 10   # Minimum number of idle children
pm.max_spare_servers = 50   # Maximum number of idle children
pm.process_idle_timeout = 10s ; Value between text and unit (e.g. "10s")

request_terminate_timeout = 120s # Maximum script execution time
request_slowlog_timeout = 30s    # Log scripts taking longer than this

pm.max_requests = 500        # Restart a child process after this many requests
                             # Helps prevent memory leaks over time

Gunicorn Tuning for Python WSGI Applications

When serving Python WSGI applications, Gunicorn is a popular choice. Its configuration focuses on worker processes, threads, and timeouts.

Worker Types and Counts

Gunicorn supports several worker types:

sync: The default and simplest worker type. Each worker handles one request at a time.
gevent/eventlet: Asynchronous workers that use green threads to handle multiple requests concurrently within a single process. These are generally more performant for I/O-bound applications.
threads: Uses Python’s native threading. Less common for web servers due to the Global Interpreter Lock (GIL).

The number of workers is typically set using the -w or --workers flag. A common recommendation is (2 * number_of_cpu_cores) + 1. For asynchronous workers (gevent/eventlet), you might use more workers per core, as they are less CPU-bound per request.

Worker Timeout and Threads

--timeout (or -t) specifies the number of seconds Gunicorn will wait for a worker to respond to a request before considering it timed out. This should be set higher than your application’s longest expected request processing time. If using the gthread worker type, --threads controls the number of threads per worker process.

Example Gunicorn Command Line Configuration

This example uses the gevent worker type, suitable for I/O-bound applications. Adjust worker count based on your server’s CPU cores.

gunicorn --workers 4 \
         --worker-class gevent \
         --bind 0.0.0.0:8000 \
         --timeout 120 \
         --graceful-timeout 120 \
         --log-level info \
         your_project.wsgi:application

MongoDB Performance Tuning on AWS

Optimizing MongoDB, especially on AWS, involves both instance selection and database-level configurations. For managed services like Amazon DocumentDB or MongoDB Atlas, many underlying OS-level tunings are handled for you, but application-level queries and schema design remain critical.

Instance Sizing and EBS Volumes

Choosing the right EC2 instance type (or DocumentDB/Atlas equivalent) is crucial. Instances with higher network bandwidth and more vCPUs are generally better for database workloads. For EBS volumes, use gp3 or io1/io2. gp3 offers consistent baseline performance and allows independent scaling of IOPS and throughput, making it cost-effective. io1/io2 provide guaranteed IOPS but are more expensive. Ensure your EBS volume is provisioned with sufficient IOPS and throughput to meet your application’s demands. For MongoDB, data is read from disk frequently, so I/O performance is paramount.

MongoDB Configuration File (`mongod.conf`)

Key parameters in mongod.conf (or equivalent for managed services) include:

storage.wiredTiger.engineConfig.cacheSizeGB: This is arguably the most important setting. It controls the size of the WiredTiger cache. A common recommendation is to allocate 50% of the system’s RAM to the WiredTiger cache, leaving enough for the OS and other processes. For example, on a 32GB RAM instance, you might set this to 16GB.
storage.journal.enabled: Should almost always be true for durability.
net.bindIp: Configure this to bind to the appropriate network interfaces for your AWS VPC.
operationProfiling.mode: Set to all or slowOp to enable profiling for identifying slow queries.

Indexing Strategy

Proper indexing is fundamental to MongoDB performance. Analyze slow queries using the profiler or explain() output to identify missing or inefficient indexes. Avoid over-indexing, as indexes consume memory and slow down write operations.

Query Optimization

Use the explain() method on your queries to understand how MongoDB executes them. Look for operations that perform collection scans (COLLSCAN) or lack an index. Rewrite queries to leverage existing indexes effectively. Consider projection (returning only necessary fields) to reduce network I/O and memory usage.

Example MongoDB Configuration Snippet

This snippet shows relevant settings in mongod.conf. Note that managed services might abstract some of these.

systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
  wiredTiger:
    engineConfig:
      cacheSizeGB: 16 # Adjust based on instance RAM (e.g., 50% of total RAM)

# network interfaces
net:
  port: 27017
  bindIp: 127.0.0.1,10.0.1.10 # Replace with your VPC private IP or 0.0.0.0 for all interfaces

# security:
#   authorization: enabled

# operation profiling
operationProfiling:
  mode: slowOp # or 'all' for more detailed profiling
  slowOpThresholdMs: 100 # Log operations slower than 100ms

# Sharding (if applicable)
# sharding:
#   clusterRole: configsvr
#   configDB: mycluster.local:27019

Monitoring and Iteration

Performance tuning is an iterative process. Continuously monitor your Nginx, PHP-FPM/Gunicorn, and MongoDB metrics using tools like CloudWatch, Prometheus/Grafana, or built-in monitoring dashboards. Pay attention to CPU utilization, memory usage, network I/O, disk I/O, request latency, and error rates. Use this data to identify bottlenecks and further refine your configurations.