The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on AWS for PHP

Nginx Tuning for High-Traffic PHP Applications

Optimizing Nginx as a reverse proxy and static file server is paramount for any high-traffic PHP application. We’ll focus on key directives that directly impact performance and resource utilization on AWS.

Worker Processes and Connections

The worker_processes directive dictates how many worker processes Nginx will spawn. A common recommendation is to set it to the number of CPU cores available. worker_connections defines the maximum number of simultaneous connections that each worker process can handle. The total maximum connections is worker_processes * worker_connections.

On an AWS EC2 instance, determine the number of vCPUs:

grep -c ^processor /proc/cpuinfo

Then, configure nginx.conf (typically located at /etc/nginx/nginx.conf or /usr/local/nginx/conf/nginx.conf):

user www-data;
worker_processes auto; # Or set to the number of CPU cores
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 1024; # Adjust based on expected load and system limits
    multi_accept on;
}

multi_accept on; allows workers to accept multiple connections at once, which can be beneficial under heavy load.

Keepalive Connections

HTTP keep-alive connections reduce the overhead of establishing new TCP connections for each request. Tune keepalive_timeout and keepalive_requests.

http {
    # ... other http directives ...

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 1000;

    # ... rest of http block ...
}

A keepalive_timeout of 65 seconds is a common starting point. keepalive_requests limits the number of requests served over a single keep-alive connection.

Buffering and Gzip Compression

Nginx buffering can significantly improve performance by reducing the number of read/write operations. Gzip compression reduces bandwidth usage and speeds up asset delivery.

http {
    # ...

    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    client_body_buffer_size 128k;
    client_header_buffer_size 1k;
    large_client_header_buffers 2 128k;
    output_buffers 1 32k;
    post_max_size 20m; # Match PHP's post_max_size if applicable
    client_max_body_size 20m; # Match PHP's upload_max_filesize if applicable

    # ...
}

gzip_comp_level 6 offers a good balance between compression ratio and CPU usage. Ensure client_max_body_size and post_max_size are set appropriately for your application’s file upload needs and align with PHP’s configuration.

Tuning for PHP-FPM or Gunicorn

When proxying to PHP-FPM or Gunicorn, configure Nginx to efficiently pass requests and handle responses.

location ~ \.php$ {
    include snippets/fastcgi-php.conf;
    # For PHP-FPM
    fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Adjust socket path as needed
    # For Gunicorn (e.g., Python app)
    # fastcgi_pass unix:/path/to/your/app.sock;

    # FastCGI parameters for performance
    fastcgi_read_timeout 300; # Increase for long-running PHP scripts
    fastcgi_send_timeout 300;
    fastcgi_connect_timeout 60;
    fastcgi_buffer_size 128k;
    fastcgi_buffers 8 128k;
    fastcgi_busy_buffers_size 256k;
    fastcgi_temp_file_write_size 256k;
}

The fastcgi_read_timeout, fastcgi_send_timeout, and fastcgi_connect_timeout values should be generous enough to accommodate your application’s longest-running operations. Adjust buffer sizes based on typical response sizes.

Gunicorn/PHP-FPM Optimization on AWS

The application server (Gunicorn for Python, PHP-FPM for PHP) is the next critical layer. Proper configuration here directly impacts request handling capacity and latency.

PHP-FPM Configuration

PHP-FPM’s process manager configuration is key. The pm setting determines the process management strategy: static, dynamic, or ondemand. For most production environments, dynamic offers a good balance.

Edit your PHP-FPM pool configuration file (e.g., /etc/php/7.4/fpm/pool.d/www.conf):

; Choose one of: static, dynamic, ondemand
pm = dynamic

; If pm = dynamic:
pm.max_children = 100       ; Max number of children at any one time
pm.start_servers = 5        ; Number of children at startup
pm.min_spare_servers = 2    ; Min number of idle respawns
pm.max_spare_servers = 10   ; Max number of idle respawns
pm.max_requests = 500       ; Max requests per child process before respawning

; If pm = static:
; pm.max_children = 150     ; Fixed number of children

; If pm = ondemand:
; pm.max_children = 100
; pm.start_time = '00:00:00'
; pm.max_requests = 0

; Other important settings
request_terminate_timeout = 300 ; Timeout for script execution (seconds)
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
listen.backlog = 512 ; Adjust based on Nginx's worker_connections and system limits

pm.max_children should be tuned based on your server’s RAM. A common rule of thumb is to calculate the memory footprint of a single PHP-FPM worker (e.g., using ps aux | grep php-fpm after a few requests) and divide your total available RAM by that figure, leaving room for the OS and other services. pm.max_requests helps prevent memory leaks by respawning processes periodically.

Gunicorn Configuration

Gunicorn’s worker class and number of workers are crucial. The default sync worker class is simple but can block under heavy load. gevent or eventlet (asynchronous) workers are often preferred for I/O-bound applications.

A common Gunicorn command-line configuration:

gunicorn --workers 3 \
         --worker-class gevent \
         --bind unix:/path/to/your/app.sock \
         --timeout 300 \
         --log-level info \
         your_app.wsgi:application

The number of workers is typically calculated as (2 * number_of_cores) + 1. For I/O-bound applications, you might increase this. For CPU-bound applications, stick closer to the core count. --timeout should match Nginx’s fastcgi_read_timeout. Ensure the --bind socket path matches what Nginx is configured to proxy to.

Elasticsearch Performance Tuning on AWS

Elasticsearch performance is heavily influenced by JVM heap size, file system cache, and shard configuration. On AWS, consider instance types optimized for memory and I/O.

JVM Heap Size

The JVM heap size is arguably the most critical Elasticsearch setting. It should be set to no more than 50% of the system’s RAM, and never exceed 30-32GB due to compressed ordinary object pointers (compressed oops).

Edit jvm.options (typically found in /etc/elasticsearch/jvm.options or /usr/share/elasticsearch/config/jvm.options):

-Xms4g
-Xmx4g

This example sets both the initial and maximum heap size to 4GB. Adjust this value based on your instance’s RAM. For an 8GB RAM instance, 4GB is appropriate. For a 64GB instance, you might set it to 30GB. Always ensure -Xms and -Xmx are set to the same value to prevent heap resizing pauses.

File System Cache

Elasticsearch relies heavily on the operating system’s file system cache. Ensure your instance has sufficient RAM to allow the OS to cache indices effectively. Avoid running other memory-intensive applications on the same instance.

On Linux, you can monitor file system cache usage:

free -h

The buff/cache column in the output of free -h indicates memory used by the file system cache. Aim for a significant portion of your RAM to be available for this cache.

Shard Allocation and Sizing

The number and size of shards significantly impact performance. Too many small shards can overwhelm the cluster, while too few large shards can lead to slow recovery and uneven load distribution.

General Guidelines:

Aim for shard sizes between 10GB and 50GB.
Keep the number of shards per GB of heap low (e.g., no more than 20 shards per GB of heap).
Avoid oversharding. Start with a reasonable number of primary shards (e.g., 1-3) and scale horizontally by adding nodes rather than increasing shard count unnecessarily.

You can check existing shard sizes and counts via the Elasticsearch API:

GET _cat/shards?v&h=index,shard,prirep,state,docs.count,docs.deleted,store.size&s=state
GET _cat/indices?v&h=index,health,status,uuid,pri,rep,docs.count,store.size&s=store.size:desc

When creating indices, explicitly define the number of primary shards and replicas. For example, using the Elasticsearch Index Templates API:

PUT _index_template/my_app_template
{
  "index_patterns": ["my-app-*"],
  "template": {
    "settings": {
      "index": {
        "number_of_shards": 3,
        "number_of_replicas": 1
      }
    }
  }
}

Adjust number_of_shards based on your expected data volume and number_of_replicas for fault tolerance and search performance (replicas can serve search requests).

Swapping and Elasticsearch

Elasticsearch is extremely sensitive to swapping. Ensure that swapping is disabled or heavily restricted on your Elasticsearch nodes.

# Check swap status
sudo swapon --show

# Disable swap (temporary)
sudo swapoff -a

# To disable permanently, edit /etc/fstab and comment out swap lines

Additionally, configure Elasticsearch to prevent memory locking:

# In elasticsearch.yml
bootstrap.memory_lock: true

And ensure the Elasticsearch user has the necessary `ulimit` settings in place (e.g., in /etc/security/limits.conf or a systemd service file) to allow `memlock` and increase `nofile` limits.