The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and DynamoDB on AWS for C

Nginx Tuning for High-Traffic PHP Applications

Optimizing Nginx is crucial for serving high-traffic PHP applications. The primary goals are to reduce latency, maximize throughput, and efficiently manage resources. We’ll focus on key directives that directly impact performance.

Worker Processes and Connections

The worker_processes directive determines how many worker processes Nginx will spawn. A common recommendation is to set this to the number of CPU cores available. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be worker_processes * worker_connections.

To determine the number of CPU cores, you can use the nproc command on Linux:

nproc

Then, configure Nginx accordingly. It’s generally safe to set worker_connections to a high value, such as 1024 or 4096, provided your system’s file descriptor limits are sufficient.

# /etc/nginx/nginx.conf

user www-data;
worker_processes auto; # Or set to the number of CPU cores
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 4096; # Adjust based on system limits and expected load
    multi_accept on;
}

Keepalive Connections

HTTP keep-alive connections reduce the overhead of establishing new TCP connections for each request. The keepalive_timeout directive specifies how long an idle keep-alive connection will remain open. A value between 60 and 120 seconds is often a good starting point. keepalive_requests limits the number of requests that can be served over a single keep-alive connection.

# /etc/nginx/nginx.conf

http {
    # ... other http directives ...

    keepalive_timeout 75;
    keepalive_requests 100;

    # ... rest of http block ...
}

Buffering and Gzip Compression

Nginx buffering can improve performance by allowing it to send responses to clients more efficiently. client_body_buffer_size, client_header_buffer_size, large_client_header_buffers, and output_buffers are key directives. For Gzip compression, enabling it can significantly reduce bandwidth usage and improve load times for text-based assets.

# /etc/nginx/nginx.conf

http {
    # ...

    client_body_buffer_size 10K;
    client_header_buffer_size 1k;
    large_client_header_buffers 2 8k;
    output_buffers 1 32k;
    post_action @fallback; # Example for handling large POSTs

    # Gzip Compression
    gzip on;
    gzip_disable "msie6"; # Disable for older IE versions
    gzip_vary on;
    gzip_proxied any; # Compress proxied responses
    gzip_comp_level 6; # Compression level (1-9)
    gzip_buffers 16 8k; # Number and size of buffers
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # ...
}

Gunicorn Tuning for Python WSGI Applications

Gunicorn (Green Unicorn) is a popular WSGI HTTP Server for Python. Its performance is heavily influenced by the number of worker processes and their type.

Worker Types and Count

Gunicorn offers several worker types:

Sync Workers (default): Each worker handles one request at a time. This is simple but can be a bottleneck under high concurrency.
Async Workers (e.g., Gevent, Eventlet): These workers can handle multiple requests concurrently using non-blocking I/O. They are generally preferred for I/O-bound applications.
Gevent Workers: Uses gevent coroutines for concurrency.
Eventlet Workers: Uses eventlet coroutines for concurrency.

The number of workers is typically set based on the number of CPU cores. A common starting point is (2 * number_of_cores) + 1. For async workers, you might start with fewer workers but a higher concurrency per worker.

To start Gunicorn with 4 gevent workers:

gunicorn --workers 4 --worker-class gevent --bind 0.0.0.0:8000 myapp.wsgi:application

For CPU-bound applications using sync workers, a common pattern is:

# Assuming 8 CPU cores
gunicorn --workers 17 --bind 0.0.0.0:8000 myapp.wsgi:application

Worker Timeout and Graceful Reloads

The --timeout setting defines how long Gunicorn will wait for a worker to respond before killing it. This is crucial to prevent hung requests from blocking workers indefinitely. A value between 30 and 120 seconds is common, depending on the expected request duration.

Graceful reloads (kill -HUP ) allow workers to finish their current requests before being restarted, minimizing downtime during configuration changes or code updates.

# Example with a 60-second timeout
gunicorn --workers 4 --worker-class gevent --timeout 60 --bind 0.0.0.0:8000 myapp.wsgi:application

PHP-FPM Tuning for PHP Applications

PHP-FPM (FastCGI Process Manager) is the de facto standard for running PHP applications with web servers like Nginx. Its performance hinges on the process manager settings.

Process Manager Settings

The pm (process manager) setting determines how PHP-FPM manages worker processes. The most common options are:

static: A fixed number of child processes are spawned. Good for predictable loads.
dynamic: Processes are spawned dynamically based on load, with minimum and maximum limits.
ondemand: Processes are spawned only when a request arrives and are killed after a period of inactivity.

For high-traffic sites, dynamic is often preferred. Key directives include:

pm.max_children: The maximum number of child processes that will be spawned.
pm.start_servers: The number of child processes to start when PHP-FPM starts.
pm.min_spare_servers: The desired minimum number of idle supervisor processes.
pm.max_spare_servers: The desired maximum number of idle supervisor processes.
pm.max_requests: The number of requests each child process should execute before respawning. This helps prevent memory leaks.

A common tuning strategy for dynamic PM:

; /etc/php/8.1/fpm/pool.d/www.conf (example path)

[www]
user = www-data
group = www-data
listen = /run/php/php8.1-fpm.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0660

pm = dynamic
pm.max_children = 100       ; Adjust based on available RAM and expected concurrency
pm.start_servers = 10       ; Initial number of processes
pm.min_spare_servers = 5    ; Minimum idle processes
pm.max_spare_servers = 20   ; Maximum idle processes
pm.max_requests = 500       ; Respawn after this many requests to prevent leaks

If you have ample RAM and a predictable, high load, static can offer slightly better performance by avoiding the overhead of dynamic process management.

; Example for static PM
pm = static
pm.max_children = 150       ; Set to a fixed, high number based on RAM
pm.max_requests = 500

Request Termination and Slowlog

request_terminate_timeout sets the maximum time a script can run before being killed. This prevents runaway scripts from consuming resources. The slowlog directive logs scripts that take longer than a specified threshold, aiding in performance bottleneck identification.

; /etc/php/8.1/fpm/pool.d/www.conf

request_terminate_timeout = 60s ; Terminate script after 60 seconds
request_slowlog_timeout = 10s   ; Log scripts exceeding 10 seconds
slowlog = /var/log/php/php8.1-fpm.slow.log

DynamoDB Tuning and Best Practices on AWS

DynamoDB is a fully managed NoSQL database service. Performance tuning primarily involves understanding provisioned throughput, indexing strategies, and query optimization.

Provisioned Throughput (RCUs & WCUs)

DynamoDB operates on a throughput model based on Read Capacity Units (RCUs) and Write Capacity Units (WCUs). Each RCU allows one strongly consistent read per second for an item up to 4KB, or two eventually consistent reads per second. Each WCU allows one write per second for an item up to 1KB.

Key Strategies:

Auto Scaling: Configure DynamoDB Auto Scaling to automatically adjust provisioned throughput based on actual traffic. This is the most cost-effective and resilient approach. Set target utilization percentages (e.g., 70% for reads, 50% for writes).
On-Demand Capacity: For unpredictable workloads, On-Demand mode is simpler, paying per request. However, it can be more expensive for consistent, high-throughput workloads.
Monitoring: Continuously monitor ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits, and ThrottledRequests in CloudWatch.

Example AWS CLI command to configure Auto Scaling for a table:

aws application-autoscaling register-scalable-target \
    --service-namespace dynamodb \
    --resource-id table/YourTableName \
    --scalable-dimension dynamodb:table:ReadCapacityUnits \
    --min-capacity 5 \
    --max-capacity 50

aws application-autoscaling register-scalable-target \
    --service-namespace dynamodb \
    --resource-id table/YourTableName \
    --scalable-dimension dynamodb:table:WriteCapacityUnits \
    --min-capacity 5 \
    --max-capacity 50

aws application-autoscaling put-scaling-policy \
    --policy-name YourReadScalingPolicy \
    --service-namespace dynamodb \
    --resource-id table/YourTableName \
    --scalable-dimension dynamodb:table:ReadCapacityUnits \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration '{
        "TargetValue": 0.7,
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "DynamoDBReadCapacityUtilization"
        }
    }'

aws application-autoscaling put-scaling-policy \
    --policy-name YourWriteScalingPolicy \
    --service-namespace dynamodb \
    --resource-id table/YourTableName \
    --scalable-dimension dynamodb:table:WriteCapacityUnits \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration '{
        "TargetValue": 0.5,
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "DynamoDBWriteCapacityUtilization"
        }
    }'

Indexing Strategies

The choice of primary key (partition key and optional sort key) and the use of Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs) are critical for efficient querying. A good partition key distributes data evenly across partitions to avoid hot spots.

Best Practices:

Avoid Hot Partitions: Design partition keys that distribute requests evenly. High cardinality attributes are generally good candidates.
Use GSIs for Query Flexibility: GSIs allow you to query data on attributes other than the primary key. Project only the attributes needed for your queries to save on storage and throughput costs.
LSIs vs. GSIs: LSIs share the same partition key as the base table but have a different sort key. They are constrained by partition size limits (10GB) and are only available when the table is created. GSIs are more flexible and do not have the same partition size limitations.
Index Projection: Carefully choose attribute projections (ALL, KEYS_ONLY, or SPECIFIC_ATTRIBUTES) for GSIs to optimize read costs and performance.

Query Optimization

Understanding how DynamoDB executes queries is key to optimizing them.

Key Techniques:

Query vs. Scan: Always use Query when possible. Query requires the partition key and can optionally filter by sort key. Scan reads every item in the table and then filters, which is inefficient and costly for large tables.
Filter Expressions: Use FilterExpression to reduce the amount of data returned from a Scan or Query. However, filtering happens *after* data is read, so it doesn’t reduce the consumed capacity, only the data transferred.
Projection Expressions: Use ProjectionExpression to retrieve only the attributes you need. This reduces the amount of data read from disk and transferred over the network, saving RCUs and improving latency.
Pagination: For large result sets, use the LastEvaluatedKey returned by Query and Scan operations to paginate through the results.
Batch Operations: Use BatchGetItem and BatchWriteItem to retrieve or write multiple items in a single API call, reducing network overhead.

Example Python code using Boto3 for efficient querying:

import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('YourTableName')

# Efficient Query using Partition Key and Projection Expression
response = table.query(
    KeyConditionExpression=Key('partitionKeyName').eq('some_value'),
    ProjectionExpression='attribute1, attribute2' # Only retrieve these attributes
)

items = response['Items']
while 'LastEvaluatedKey' in response:
    response = table.query(
        KeyConditionExpression=Key('partitionKeyName').eq('some_value'),
        ProjectionExpression='attribute1, attribute2',
        ExclusiveStartKey=response['LastEvaluatedKey']
    )
    items.extend(response['Items'])

print(f"Retrieved {len(items)} items.")

# Example of a Scan with FilterExpression (use with caution)
# response = table.scan(
#     FilterExpression=Attr('someAttribute').gt(100),
#     ProjectionExpression='attribute1'
# )
# print(response['Items'])