The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and MongoDB on AWS for Shopify

Nginx as a High-Performance Reverse Proxy and Load Balancer

For a Shopify backend running on AWS, Nginx serves as the critical entry point, handling SSL termination, static file serving, and reverse proxying requests to your application servers (Gunicorn for Python/Django/Flask, or PHP-FPM for PHP applications). Optimizing Nginx is paramount for low latency and high throughput.

Nginx Configuration Tuning

We’ll focus on key directives within nginx.conf or included configuration files. Ensure these are tuned for your specific AWS instance types and expected load.

Worker Processes and Connections

The worker_processes directive should ideally be set to the number of CPU cores available on your EC2 instance. worker_connections defines the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be worker_processes * worker_connections.

Example Nginx Configuration Snippet

For a c5.xlarge instance (4 vCPUs):

# /etc/nginx/nginx.conf

user www-data;
worker_processes 4; # Match the number of CPU cores
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 4096; # Adjust based on expected concurrent connections per worker
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    server_tokens off; # Hide Nginx version for security

    # Gzip compression for text-based assets
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # SSL Configuration (essential for Shopify)
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
    ssl_session_cache shared:SSL:10m; # Adjust size as needed
    ssl_session_timeout 10m;
    ssl_session_tickets off; # Consider security implications

    # Include virtual host configurations
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Optimizing Static File Serving

Nginx excels at serving static assets. Leverage browser caching and efficient file serving directives.

# Inside your server block for static assets

location ~ ^/(images|javascript|js|css|flash|media|static)/ {
    expires 365d; # Cache for a year
    add_header Cache-Control "public";
    access_log off; # Don't log access for static files to reduce I/O
    try_files $uri $uri/ =404;
}

# For specific Shopify static assets (e.g., theme files)
location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2)$ {
    expires 365d;
    add_header Cache-Control "public";
    access_log off;
}

Reverse Proxy Configuration

This section defines how Nginx forwards requests to your application servers. Key directives include proxy_pass, proxy_set_header, and timeouts.

# Inside your server block for dynamic requests

location / {
    proxy_pass http://your_app_backend; # e.g., http://127.0.0.1:8000 for Gunicorn
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    proxy_connect_timeout 60s;
    proxy_send_timeout 60s;
    proxy_read_timeout 60s;

    proxy_buffer_size 128k;
    proxy_buffers 4 256k;
    proxy_busy_buffers_size 256k;
}

# For WebSocket support (if your app uses it)
location /ws/ {
    proxy_pass http://your_app_backend;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
}

Gunicorn Tuning for Python Applications

Gunicorn (Green Unicorn) is a popular WSGI HTTP Server for Python. Its performance is heavily influenced by the number of worker processes and the worker class used.

Worker Processes and Type

The recommended worker type for I/O-bound applications (typical for web apps) is gevent or eventlet, which use asynchronous I/O. For CPU-bound tasks, sync workers might be simpler but less performant under high concurrency. The number of workers is typically calculated as (2 * number_of_cores) + 1.

Example Gunicorn Command Line

# Assuming your Django/Flask app's WSGI entry point is in 'myproject.wsgi'
# For a 4-core instance:
gunicorn --workers 9 \
         --worker-class gevent \
         --bind 127.0.0.1:8000 \
         --timeout 120 \
         --graceful-timeout 120 \
         --log-level info \
         --access-logfile /var/log/gunicorn/access.log \
         --error-logfile /var/log/gunicorn/error.log \
         myproject.wsgi:application

Explanation:

--workers 9: For 4 cores, (2*4)+1 = 9 workers. Adjust based on your application’s I/O vs. CPU bound nature.
--worker-class gevent: Utilizes greenlets for concurrency. Requires installing gevent (`pip install gevent`).
--bind 127.0.0.1:8000: Listens on localhost, port 8000, so Nginx can proxy to it.
--timeout 120: Maximum time (in seconds) a worker can spend processing a request. Crucial for long-running operations.
--graceful-timeout 120: Time to wait for existing requests to finish during a reload.
--log-level info: Sets the logging verbosity.
--access-logfile, --error-logfile: Essential for monitoring and debugging. Ensure the log directory exists and has correct permissions.

Gunicorn Configuration File

For more complex configurations or easier management, use a Gunicorn configuration file (e.g., gunicorn_config.py).

# gunicorn_config.py

import multiprocessing

bind = "127.0.0.1:8000"
workers = (multiprocessing.cpu_count() * 2) + 1
worker_class = "gevent" # or "eventlet"
timeout = 120
graceful_timeout = 120
loglevel = "info"
accesslog = "/var/log/gunicorn/access.log"
errorlog = "/var/log/gunicorn/error.log"
# access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"' # Custom log format

Then run Gunicorn with:

gunicorn -c gunicorn_config.py myproject.wsgi:application

PHP-FPM Tuning for PHP Applications

PHP-FPM (FastCGI Process Manager) is the standard for running PHP applications with Nginx. Its performance hinges on the process manager settings.

Process Manager Settings

The primary configuration file is typically /etc/php/X.Y/fpm/php-fpm.conf and pool configurations in /etc/php/X.Y/fpm/pool.d/www.conf (where X.Y is your PHP version).

Tuning `www.conf`

; /etc/php/8.1/fpm/pool.d/www.conf

[www]
user = www-data
group = www-data
listen = /run/php/php8.1-fpm.sock ; Or a TCP socket like 127.0.0.1:9000

; Process Manager Settings
; pm = dynamic ; or static or ondemand
pm = dynamic

; For dynamic PM:
; pm.max_children = 50 ; Max number of children that can be started.
; pm.start_servers = 5  ; Number of children when pm is asked to start the pool.
; pm.min_spare_servers = 2 ; Number of children when pm should not generate more children.
; pm.max_spare_servers = 10 ; Number of children when pm should kill some children.
; pm.process_idle_timeout = 10s ; How long a child can be idle before being killed.

; For static PM (recommended for predictable load and dedicated servers):
; pm = static
; pm.max_children = 50 ; Fixed number of children.

; For ondemand PM (saves resources when idle):
; pm = ondemand
; pm.max_children = 50
; pm.process_idle_timeout = 10s ; Idle children are killed after this timeout.

; Adjust based on your instance size and expected load.
; A common starting point for dynamic PM on a 4-core instance:
pm.max_children = 100
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
pm.process_idle_timeout = 30s

; Request handling
request_terminate_timeout = 120s ; Max execution time for a script. Match Nginx proxy_read_timeout.
; request_slowlog_timeout = 10s ; Log scripts that take too long to execute.

; Environment variables
; env[MY_ENV_VAR] = 'value'

; Other useful settings
; catch_workers_output = yes ; Capture stdout/stderr of workers
; listen.owner = www-data
; listen.group = www-data
; listen.mode = 0660

Tuning Strategy:

pm: dynamic is a good balance. static is best for consistent high load if you have enough memory. ondemand is good for minimizing idle resource usage.
pm.max_children: This is the most critical. It’s limited by your server’s RAM. A rough estimate: each PHP-FPM worker might consume 20-50MB of RAM. For a 16GB RAM instance, you might aim for (16GB * 1024MB/GB) / 50MB/child ≈ 327 children. Start lower and monitor.
pm.start_servers, pm.min_spare_servers, pm.max_spare_servers: These control how quickly PHP-FPM scales up and down. Tune them to avoid constant process churning.
request_terminate_timeout: Should align with Nginx’s proxy_read_timeout to prevent conflicts.

MongoDB Performance Tuning on AWS

MongoDB performance is sensitive to hardware, configuration, and query patterns. On AWS, consider instance types optimized for I/O (e.g., i3, i4i, r6i with NVMe SSDs) and appropriate EBS volumes.

MongoDB Configuration (`mongod.conf`)

Key parameters in /etc/mongod.conf (or similar path) include storage engine settings, journaling, and cache size.

# /etc/mongod.conf

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
  engine: wiredTiger # Default and recommended

# WiredTiger specific options
  wiredTiger:
    engineConfig:
      cacheSizeGB: 0.75 # Allocate 75% of available RAM to WiredTiger cache
    collectionConfig:
      cache Размер: 0 # Use global cache
    indexConfig:
      cacheSizeGB: 0 # Use global cache

# Network interfaces
net:
  port: 27017
  bindIp: 0.0.0.0 # Or specific IPs for security

# Logging
systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true
  verbosity: 0 # 0 for normal, 1 for verbose, 2 for debug

# Sharding (if applicable)
# sharding:
#   clusterRole: configsvr
#   configsvrFilePermissions:
#     group:
#       name: mongod
#       mode: 0770
#     other:
#       read: false
#       write: false

# Security (essential for production)
# security:
#   authorization: enabled

# Replication (essential for production)
# replication:
#   replSetName: rs0

Tuning Strategy:

storage.wiredTiger.cacheSizeGB: This is the most crucial parameter. Allocate a significant portion of your instance’s RAM to the WiredTiger cache. For a 32GB RAM instance, you might set this to 24GB (leaving 8GB for the OS and other processes). Monitor cache hit rates.
storage.journal.enabled: true: Essential for durability. Do not disable unless you understand the risks.
net.bindIp: Restrict access to only necessary IPs. Use security groups on AWS.
systemLog.verbosity: Keep at 0 for production unless debugging. High verbosity impacts I/O.

AWS Specific Considerations for MongoDB

Instance Type: Use instances with local NVMe SSDs (i3, i4i) for best I/O performance. If using EBS, choose gp3 or io2 volumes and provision IOPS/throughput appropriately.

EBS Volumes: For non-local SSD instances, use gp3 volumes. They offer baseline performance and allow independent scaling of IOPS and throughput. Provision enough IOPS and throughput for your workload. For extremely demanding workloads, consider io2 Block Express.

Network: Ensure your EC2 instances are in the same VPC and Availability Zone (or use multi-AZ replication) for low latency. Use Security Groups to control access.

Monitoring MongoDB Performance

Regularly monitor key metrics using mongostat, mongotop, and the MongoDB Cloud Manager/Ops Manager or Prometheus/Grafana stack.

# Real-time connection and operation stats
mongostat --host your_mongo_host:27017 --username your_user --password your_password --authenticationDatabase admin --oplog -n 5

# Example Output Interpretation:
# inserts, query, update, delete, getmore, command
# ns: namespace
# res: resident memory
# % idx miss: percentage of queries not using an index
# qr|qw: query queue (reads/writes)
# ar|aw: active query (reads/writes)

# Monitor index usage and query performance
mongotop --host your_mongo_host:27017 --username your_user --password your_password --authenticationDatabase admin 10

# Key metrics to watch:
# - Cache Hit Rate: Aim for > 95% for WiredTiger.
# - Disk I/O: Monitor read/write operations per second and throughput.
# - Query Latency: Identify slow queries.
# - Network Traffic: Ensure sufficient bandwidth.
# - CPU Utilization: High CPU might indicate inefficient queries or insufficient cores.
# - Memory Usage: Monitor RAM usage and ensure the cache is effectively utilized.

Use explain() on slow queries to identify missing indexes or inefficient query plans.

db.collection.find({ field: "value" }).explain("executionStats")

Putting It All Together: AWS Deployment Strategy

A typical setup on AWS for a Shopify backend might involve:

Load Balancer: AWS Application Load Balancer (ALB) or Network Load Balancer (NLB) in front of Nginx instances.
Web Servers: EC2 instances running Nginx, configured as described above. Use Auto Scaling Groups for high availability and scalability.
Application Servers: EC2 instances running Gunicorn/PHP-FPM, also managed by Auto Scaling Groups. These instances would be targeted by Nginx’s proxy_pass.
Database: AWS RDS for PostgreSQL/MySQL (if used for other services) or self-managed MongoDB on EC2 instances (preferably with local NVMe SSDs or provisioned IOPS EBS) in an Auto Scaling Group or managed via a separate cluster. Consider Amazon DocumentDB if a fully managed MongoDB-compatible service is preferred, though direct EC2 offers more tuning control.
Caching: ElastiCache for Redis/Memcached for session storage and object caching.
CDN: CloudFront for serving static assets globally.

This layered approach allows each component to be scaled and optimized independently, providing a robust and performant infrastructure for your Shopify application on AWS.