The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and DynamoDB on AWS for C
Nginx Tuning for High-Traffic PHP Applications
Optimizing Nginx is crucial for serving high-traffic PHP applications. The primary goals are to reduce latency, maximize throughput, and efficiently manage resources. We’ll focus on key directives that directly impact performance.
Worker Processes and Connections
The worker_processes directive determines how many worker processes Nginx will spawn. A common recommendation is to set this to the number of CPU cores available. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be worker_processes * worker_connections.
To determine the number of CPU cores, you can use the nproc command on Linux:
nproc
Then, configure Nginx accordingly. It’s generally safe to set worker_connections to a high value, such as 1024 or 4096, provided your system’s file descriptor limits are sufficient.
# /etc/nginx/nginx.conf
user www-data;
worker_processes auto; # Or set to the number of CPU cores
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 4096; # Adjust based on system limits and expected load
multi_accept on;
}
Keepalive Connections
HTTP keep-alive connections reduce the overhead of establishing new TCP connections for each request. The keepalive_timeout directive specifies how long an idle keep-alive connection will remain open. A value between 60 and 120 seconds is often a good starting point. keepalive_requests limits the number of requests that can be served over a single keep-alive connection.
# /etc/nginx/nginx.conf
http {
# ... other http directives ...
keepalive_timeout 75;
keepalive_requests 100;
# ... rest of http block ...
}
Buffering and Gzip Compression
Nginx buffering can improve performance by allowing it to send responses to clients more efficiently. client_body_buffer_size, client_header_buffer_size, large_client_header_buffers, and output_buffers are key directives. For Gzip compression, enabling it can significantly reduce bandwidth usage and improve load times for text-based assets.
# /etc/nginx/nginx.conf
http {
# ...
client_body_buffer_size 10K;
client_header_buffer_size 1k;
large_client_header_buffers 2 8k;
output_buffers 1 32k;
post_action @fallback; # Example for handling large POSTs
# Gzip Compression
gzip on;
gzip_disable "msie6"; # Disable for older IE versions
gzip_vary on;
gzip_proxied any; # Compress proxied responses
gzip_comp_level 6; # Compression level (1-9)
gzip_buffers 16 8k; # Number and size of buffers
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
# ...
}
Gunicorn Tuning for Python WSGI Applications
Gunicorn (Green Unicorn) is a popular WSGI HTTP Server for Python. Its performance is heavily influenced by the number of worker processes and their type.
Worker Types and Count
Gunicorn offers several worker types:
- Sync Workers (default): Each worker handles one request at a time. This is simple but can be a bottleneck under high concurrency.
- Async Workers (e.g., Gevent, Eventlet): These workers can handle multiple requests concurrently using non-blocking I/O. They are generally preferred for I/O-bound applications.
- Gevent Workers: Uses gevent coroutines for concurrency.
- Eventlet Workers: Uses eventlet coroutines for concurrency.
The number of workers is typically set based on the number of CPU cores. A common starting point is (2 * number_of_cores) + 1. For async workers, you might start with fewer workers but a higher concurrency per worker.
To start Gunicorn with 4 gevent workers:
gunicorn --workers 4 --worker-class gevent --bind 0.0.0.0:8000 myapp.wsgi:application
For CPU-bound applications using sync workers, a common pattern is:
# Assuming 8 CPU cores gunicorn --workers 17 --bind 0.0.0.0:8000 myapp.wsgi:application
Worker Timeout and Graceful Reloads
The --timeout setting defines how long Gunicorn will wait for a worker to respond before killing it. This is crucial to prevent hung requests from blocking workers indefinitely. A value between 30 and 120 seconds is common, depending on the expected request duration.
Graceful reloads (kill -HUP ) allow workers to finish their current requests before being restarted, minimizing downtime during configuration changes or code updates.
# Example with a 60-second timeout gunicorn --workers 4 --worker-class gevent --timeout 60 --bind 0.0.0.0:8000 myapp.wsgi:application
PHP-FPM Tuning for PHP Applications
PHP-FPM (FastCGI Process Manager) is the de facto standard for running PHP applications with web servers like Nginx. Its performance hinges on the process manager settings.
Process Manager Settings
The pm (process manager) setting determines how PHP-FPM manages worker processes. The most common options are:
- static: A fixed number of child processes are spawned. Good for predictable loads.
- dynamic: Processes are spawned dynamically based on load, with minimum and maximum limits.
- ondemand: Processes are spawned only when a request arrives and are killed after a period of inactivity.
For high-traffic sites, dynamic is often preferred. Key directives include:
pm.max_children: The maximum number of child processes that will be spawned.pm.start_servers: The number of child processes to start when PHP-FPM starts.pm.min_spare_servers: The desired minimum number of idle supervisor processes.pm.max_spare_servers: The desired maximum number of idle supervisor processes.pm.max_requests: The number of requests each child process should execute before respawning. This helps prevent memory leaks.
A common tuning strategy for dynamic PM:
; /etc/php/8.1/fpm/pool.d/www.conf (example path) [www] user = www-data group = www-data listen = /run/php/php8.1-fpm.sock listen.owner = www-data listen.group = www-data listen.mode = 0660 pm = dynamic pm.max_children = 100 ; Adjust based on available RAM and expected concurrency pm.start_servers = 10 ; Initial number of processes pm.min_spare_servers = 5 ; Minimum idle processes pm.max_spare_servers = 20 ; Maximum idle processes pm.max_requests = 500 ; Respawn after this many requests to prevent leaks
If you have ample RAM and a predictable, high load, static can offer slightly better performance by avoiding the overhead of dynamic process management.
; Example for static PM pm = static pm.max_children = 150 ; Set to a fixed, high number based on RAM pm.max_requests = 500
Request Termination and Slowlog
request_terminate_timeout sets the maximum time a script can run before being killed. This prevents runaway scripts from consuming resources. The slowlog directive logs scripts that take longer than a specified threshold, aiding in performance bottleneck identification.
; /etc/php/8.1/fpm/pool.d/www.conf request_terminate_timeout = 60s ; Terminate script after 60 seconds request_slowlog_timeout = 10s ; Log scripts exceeding 10 seconds slowlog = /var/log/php/php8.1-fpm.slow.log
DynamoDB Tuning and Best Practices on AWS
DynamoDB is a fully managed NoSQL database service. Performance tuning primarily involves understanding provisioned throughput, indexing strategies, and query optimization.
Provisioned Throughput (RCUs & WCUs)
DynamoDB operates on a throughput model based on Read Capacity Units (RCUs) and Write Capacity Units (WCUs). Each RCU allows one strongly consistent read per second for an item up to 4KB, or two eventually consistent reads per second. Each WCU allows one write per second for an item up to 1KB.
Key Strategies:
- Auto Scaling: Configure DynamoDB Auto Scaling to automatically adjust provisioned throughput based on actual traffic. This is the most cost-effective and resilient approach. Set target utilization percentages (e.g., 70% for reads, 50% for writes).
- On-Demand Capacity: For unpredictable workloads, On-Demand mode is simpler, paying per request. However, it can be more expensive for consistent, high-throughput workloads.
- Monitoring: Continuously monitor
ConsumedReadCapacityUnitsandConsumedWriteCapacityUnits, andThrottledRequestsin CloudWatch.
Example AWS CLI command to configure Auto Scaling for a table:
aws application-autoscaling register-scalable-target \
--service-namespace dynamodb \
--resource-id table/YourTableName \
--scalable-dimension dynamodb:table:ReadCapacityUnits \
--min-capacity 5 \
--max-capacity 50
aws application-autoscaling register-scalable-target \
--service-namespace dynamodb \
--resource-id table/YourTableName \
--scalable-dimension dynamodb:table:WriteCapacityUnits \
--min-capacity 5 \
--max-capacity 50
aws application-autoscaling put-scaling-policy \
--policy-name YourReadScalingPolicy \
--service-namespace dynamodb \
--resource-id table/YourTableName \
--scalable-dimension dynamodb:table:ReadCapacityUnits \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 0.7,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "DynamoDBReadCapacityUtilization"
}
}'
aws application-autoscaling put-scaling-policy \
--policy-name YourWriteScalingPolicy \
--service-namespace dynamodb \
--resource-id table/YourTableName \
--scalable-dimension dynamodb:table:WriteCapacityUnits \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 0.5,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "DynamoDBWriteCapacityUtilization"
}
}'
Indexing Strategies
The choice of primary key (partition key and optional sort key) and the use of Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs) are critical for efficient querying. A good partition key distributes data evenly across partitions to avoid hot spots.
Best Practices:
- Avoid Hot Partitions: Design partition keys that distribute requests evenly. High cardinality attributes are generally good candidates.
- Use GSIs for Query Flexibility: GSIs allow you to query data on attributes other than the primary key. Project only the attributes needed for your queries to save on storage and throughput costs.
- LSIs vs. GSIs: LSIs share the same partition key as the base table but have a different sort key. They are constrained by partition size limits (10GB) and are only available when the table is created. GSIs are more flexible and do not have the same partition size limitations.
- Index Projection: Carefully choose attribute projections (ALL, KEYS_ONLY, or SPECIFIC_ATTRIBUTES) for GSIs to optimize read costs and performance.
Query Optimization
Understanding how DynamoDB executes queries is key to optimizing them.
Key Techniques:
Queryvs.Scan: Always useQuerywhen possible.Queryrequires the partition key and can optionally filter by sort key.Scanreads every item in the table and then filters, which is inefficient and costly for large tables.- Filter Expressions: Use
FilterExpressionto reduce the amount of data returned from aScanorQuery. However, filtering happens *after* data is read, so it doesn’t reduce the consumed capacity, only the data transferred. - Projection Expressions: Use
ProjectionExpressionto retrieve only the attributes you need. This reduces the amount of data read from disk and transferred over the network, saving RCUs and improving latency. - Pagination: For large result sets, use the
LastEvaluatedKeyreturned byQueryandScanoperations to paginate through the results. - Batch Operations: Use
BatchGetItemandBatchWriteItemto retrieve or write multiple items in a single API call, reducing network overhead.
Example Python code using Boto3 for efficient querying:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('YourTableName')
# Efficient Query using Partition Key and Projection Expression
response = table.query(
KeyConditionExpression=Key('partitionKeyName').eq('some_value'),
ProjectionExpression='attribute1, attribute2' # Only retrieve these attributes
)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.query(
KeyConditionExpression=Key('partitionKeyName').eq('some_value'),
ProjectionExpression='attribute1, attribute2',
ExclusiveStartKey=response['LastEvaluatedKey']
)
items.extend(response['Items'])
print(f"Retrieved {len(items)} items.")
# Example of a Scan with FilterExpression (use with caution)
# response = table.scan(
# FilterExpression=Attr('someAttribute').gt(100),
# ProjectionExpression='attribute1'
# )
# print(response['Items'])