The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and PostgreSQL on AWS for Shopify
Nginx as a High-Performance Frontend Proxy
When deploying a Shopify-like application on AWS, Nginx serves as the critical entry point, handling SSL termination, static asset serving, request routing, and load balancing. Optimizing Nginx is paramount for low latency and high throughput. We’ll focus on key directives and configurations for production environments.
Nginx Configuration Tuning
The primary configuration file is typically /etc/nginx/nginx.conf, with site-specific configurations in /etc/nginx/sites-available/ and symlinked to /etc/nginx/sites-enabled/. For performance, we’ll adjust worker processes, connection limits, and caching.
Worker Processes and Connections
The worker_processes directive should ideally be set to the number of CPU cores available on your EC2 instance. worker_connections defines the maximum number of simultaneous connections a worker process can handle. The total maximum connections will be worker_processes * worker_connections.
Example nginx.conf Snippet
# Determine the number of CPU cores dynamically
daemon off;
master_process on;
worker_processes auto; # Or set to the number of CPU cores, e.g., 4
events {
worker_connections 4096; # Adjust based on expected load and system limits
multi_accept on;
use epoll; # For Linux systems
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
# Gzip compression for dynamic content
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
# Buffering and timeouts for upstream connections
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_buffer_size 16k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k;
# Client request body limits
client_max_body_size 100M; # Adjust as needed for file uploads
# Include server blocks
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
Static Asset Caching
Leverage browser caching and Nginx’s file caching for static assets (CSS, JS, images) to reduce load on your application servers and improve perceived performance for users. Configure long expires headers and potentially use proxy_cache for assets served through your application.
Example sites-enabled/your_app.conf Snippet
server {
listen 80;
server_name your-domain.com;
# Redirect HTTP to HTTPS
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 443 ssl http2;
server_name your-domain.com;
ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_session_tickets off;
# Serve static assets directly
location ~* \.(css|js|jpg|jpeg|png|gif|ico|svg|woff|woff2|ttf|eot)$ {
root /var/www/your_app/public; # Adjust path to your static assets
expires 365d;
add_header Cache-Control "public";
access_log off;
}
# Proxy requests to your application server (Gunicorn/FPM)
location / {
proxy_pass http://your_app_backend; # Defined in upstream block
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
# Define your upstream application servers
upstream your_app_backend {
# For Gunicorn (Python)
# server 127.0.0.1:8000;
# server 127.0.0.1:8001;
# For PHP-FPM
server unix:/var/run/php/php7.4-fpm.sock; # Adjust PHP version and path
# server unix:/var/run/php/php8.0-fpm.sock;
}
Gunicorn Tuning for Python Applications
Gunicorn (Green Unicorn) is a popular WSGI HTTP Server for Python. Its performance is heavily influenced by the number of worker processes, worker class, and thread settings. For I/O-bound applications, the gevent worker class is often preferred due to its non-blocking nature.
Worker Processes and Threads
The number of worker processes is typically set to (2 * number_of_cores) + 1 as a starting point. For gevent workers, threads are not directly applicable in the same way as with synchronous workers, as gevent uses green threads. If using synchronous workers, you might configure threads per worker.
Gunicorn Command Line Arguments
# Example for a Python application using gevent workers
# Assuming your application's WSGI entry point is 'your_app.wsgi:application'
# And you have 4 CPU cores on your EC2 instance
gunicorn --workers 9 \
--worker-class gevent \
--bind 0.0.0.0:8000 \
--timeout 120 \
--log-level info \
--access-logfile /var/log/gunicorn/access.log \
--error-logfile /var/log/gunicorn/error.log \
your_app.wsgi:application
Explanation:
--workers 9: (2 * 4 cores) + 1. Adjust based on your instance type and application’s I/O characteristics.--worker-class gevent: Utilizes green threads for efficient handling of concurrent I/O operations.--bind 0.0.0.0:8000: Listens on all network interfaces on port 8000. Nginx will proxy to this.--timeout 120: Sets the worker timeout to 120 seconds. Crucial for long-running requests.--log-level info: Configures logging verbosity.--access-logfile,--error-logfile: Specifies log file locations. Ensure these directories exist and have correct permissions.
Gevent Worker Tuning
For gevent workers, the number of green threads per worker is implicitly managed by the gevent library itself. You don’t typically configure a specific thread count for gevent workers in Gunicorn. The key is to ensure your application code is written to be gevent-compatible (e.g., using gevent-patched libraries for network I/O).
PHP-FPM Tuning for PHP Applications
PHP-FPM (FastCGI Process Manager) is the standard way to run PHP applications with Nginx. Its performance is governed by the number of child processes, how they are managed (static, dynamic, ondemand), and their resource limits.
Process Management and Pool Configuration
PHP-FPM pools are configured in /etc/php/[version]/fpm/pool.d/www.conf (or a custom pool name). The pm (process manager) setting is critical.
Example www.conf Snippet
; /etc/php/7.4/fpm/pool.d/www.conf (adjust PHP version as needed) [www] user = www-data group = www-data listen = /var/run/php/php7.4-fpm.sock ; Match this in Nginx upstream ; Process Manager settings pm = dynamic pm.max_children = 50 ; Max number of children at any one time pm.start_servers = 5 ; Number of children when FPM starts pm.min_spare_servers = 2 ; Min number of idle/spare children pm.max_spare_servers = 10 ; Max number of idle/spare children pm.max_requests = 500 ; Max requests a child process should execute before respawning ; Other important settings request_terminate_timeout = 120s ; Corresponds to Gunicorn's timeout ; pm.process_idle_timeout = 10s ; For pm = ondemand ; Memory limits ; php_admin_value[memory_limit] = 256M ; php_admin_value[max_execution_time] = 120
Explanation:
pm = dynamic: The process manager dynamically adjusts the number of child processes based on load. Other options arestatic(fixed number) andondemand(spawns processes only when requests arrive).dynamicis often a good balance.pm.max_children: This is a crucial setting. It should be calculated based on available memory and the memory footprint of your PHP processes. Too high, and you risk OOM errors; too low, and you’ll have request queues. A common starting point is(Total RAM - RAM for OS/Nginx/DB) / Average PHP Process Memory.pm.max_requests: Setting this to a reasonable number helps prevent memory leaks from accumulating over time by respawning child processes after a certain number of requests.request_terminate_timeout: Should align with Nginx and application server timeouts to prevent premature termination.
Tuning pm.max_children
This is often the most challenging parameter to tune. Monitor your server’s memory usage closely. Use tools like htop or AWS CloudWatch metrics (e.g., MemoryUtilization). If you see high swap usage or OOM killer activity, pm.max_children is too high. If requests are consistently slow or timing out, and CPU is not maxed out, it might be too low, leading to a backlog of requests waiting for available PHP-FPM workers.
PostgreSQL Performance Tuning on AWS RDS
For a robust Shopify-like platform, PostgreSQL is an excellent choice. AWS RDS simplifies database management, but tuning is still essential. We’ll focus on key PostgreSQL parameters and AWS-specific considerations.
RDS Parameter Groups
PostgreSQL configuration parameters are managed via Parameter Groups in AWS RDS. You’ll need to create a custom parameter group based on the default one for your instance class and PostgreSQL version.
Key Parameters to Tune
# Example parameters in a custom RDS PostgreSQL Parameter Group # Memory Management shared_buffers = 25% of instance RAM ; e.g., for a db.r5.large (15.5 GiB RAM), set to ~3.8GB effective_cache_size = 75% of instance RAM ; Allows planner to assume OS cache is available # WAL (Write-Ahead Logging) wal_buffers = 16MB wal_writer_delay = 200ms wal_checkpoint_timeout = 30min wal_checkpoint_completion_target = 0.9 # Autovacuum autovacuum = on autovacuum_max_workers = 3 autovacuum_naptime = 15s autovacuum_vacuum_threshold = 50 autovacuum_analyze_threshold = 50 # Connection Pooling (if not using external pooler like PgBouncer) max_connections = 100 ; Adjust based on application needs and instance RAM # Query Planning and Execution random_page_cost = 1.1 ; Lower for SSDs (default is 4.0) seq_page_cost = 1.0 work_mem = 16MB ; Per operation memory. Tune based on complex queries. maintenance_work_mem = 256MB ; For VACUUM, CREATE INDEX, etc. max_worker_processes = 8 ; Should be >= autovacuum_max_workers and match CPU cores for parallel queries max_parallel_workers = 4 ; Number of workers for parallel queries max_parallel_workers_per_gather = 2 ; Max workers per Gather node # Logging (for debugging/monitoring) log_min_duration_statement = 1000 ; Log queries longer than 1 second log_statement = 'ddl' ; Log DDL statements log_destination = 'stderr' logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql-%Y-%m-%d_%H-%M-%S.log' log_rotation_age = 1d log_rotation_size = 100MB
Important Notes:
- Instance RAM: The percentages for
shared_buffersandeffective_cache_sizeare general guidelines. Always monitor your instance’s memory usage. For smaller instances,shared_buffersmight be closer to 25% of RAM, while larger instances can afford to use a bit less to leave more for the OS cache. - SSDs: If your RDS instance uses SSDs (which most do), lowering
random_page_costto 1.1 is beneficial as SSDs have much faster random I/O than HDDs. - Autovacuum: Ensure autovacuum is enabled and tuned. It’s crucial for reclaiming space from dead tuples and preventing table bloat, especially in high-transaction environments. Adjust thresholds and naptime based on your workload.
max_connections: This directly consumes memory. Set it conservatively and consider using a connection pooler like PgBouncer if your application opens many short-lived connections.- Parallel Queries:
max_worker_processesandmax_parallel_workersare important for leveraging multi-core CPUs for complex analytical queries. Tune them based on your instance’s CPU count. - Parameter Group Changes: After modifying a custom parameter group, you need to reboot your RDS instance for most of these parameters to take effect.
Monitoring and Analysis
AWS RDS provides CloudWatch metrics for CPU utilization, memory, disk I/O, network traffic, and database connections. Additionally, leverage PostgreSQL’s built-in tools and extensions:
Essential SQL Queries and Tools
-- Check for long-running queries
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE state != 'idle' AND query NOT LIKE '%pg_stat_activity%'
ORDER BY query_start;
-- Analyze table bloat
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup,
(n_dead_tup::float / n_live_tup::float) AS dead_tup_ratio,
pg_size_pretty(pg_table_size(oid)) AS table_size,
pg_size_pretty(pg_total_relation_size(oid)) AS total_size
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000 AND (n_dead_tup::float / n_live_tup::float) > 0.2
ORDER BY dead_tup_ratio DESC;
-- Check autovacuum status
SELECT
relname,
last_vacuum,
last_autovacuum,
vacuum_count,
autovacuum_count
FROM pg_stat_user_tables
ORDER BY autovacuum_count DESC;
-- Monitor cache hit ratio
SELECT
sum(blks_hit) AS hits,
sum(blks_read) AS reads,
sum(blks_hit) / (sum(blks_hit) + sum(blks_read)) AS ratio
FROM pg_stat_database
WHERE datname = current_database();
A cache hit ratio above 95% is generally considered good. If it’s lower, consider increasing shared_buffers or effective_cache_size, or optimizing queries to be more cache-friendly.
Putting It All Together: AWS Infrastructure
On AWS, this setup typically involves:
- EC2 Instance(s) for Nginx: A small to medium EC2 instance (e.g., t3.medium, m5.large) configured with the Nginx settings described. Consider using Auto Scaling Groups for high availability and scalability.
- EC2 Instance(s) for Application Server (Gunicorn/PHP-FPM): These instances run your application code. They can be the same instances as Nginx if co-located, or separate instances behind Nginx. Use Auto Scaling Groups here as well.
- AWS RDS for PostgreSQL: A managed PostgreSQL instance. Choose an instance class that balances compute, memory, and I/O needs. For I/O-intensive workloads, consider instances with local NVMe SSDs or provisioned IOPS.
- Elastic Load Balancer (ELB): If using multiple Nginx instances, an ELB (ALB or NLB) can distribute traffic.
- Security Groups: Properly configure security groups to allow traffic only on necessary ports (e.g., 80, 443 for Nginx, 5432 for RDS from application servers).
- IAM Roles: Grant necessary permissions for EC2 instances to interact with other AWS services (e.g., CloudWatch for logging/monitoring).
This comprehensive tuning approach, from the edge Nginx proxy through your application runtime (Gunicorn/FPM) to the database layer (PostgreSQL), is essential for building a performant and scalable platform capable of handling significant traffic, akin to a large e-commerce site like Shopify.