The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on AWS for Python

Nginx as a High-Performance Frontend Proxy

When deploying Python web applications, Nginx serves as an indispensable frontend proxy, handling static file serving, SSL termination, request buffering, and load balancing. Optimizing Nginx is crucial for maximizing throughput and minimizing latency. We’ll focus on key directives for a production environment, assuming a typical AWS EC2 instance setup.

Worker Processes and Connections

The worker_processes directive controls the number of worker processes Nginx will spawn. A common best practice is to set this to the number of CPU cores available on the server. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be worker_processes * worker_connections. Ensure this value is less than the system’s open file descriptor limit.

Tuning `nginx.conf`

# /etc/nginx/nginx.conf

user www-data;
worker_processes auto; # Or set to the number of CPU cores, e.g., worker_processes 4;

events {
    worker_connections 1024; # Adjust based on expected load and system limits
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off; # Important for security

    # Gzip compression for text-based assets
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Buffering and timeouts for upstream connections
    proxy_connect_timeout 60s;
    proxy_send_timeout 60s;
    proxy_read_timeout 60s;
    proxy_buffer_size 16k;
    proxy_buffers 4 32k;
    proxy_busy_buffers_size 64k;

    # Client request body limits
    client_max_body_size 50M; # Adjust as needed for file uploads

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Load balancing configuration (if multiple app servers)
    # upstream myapp {
    #     server 10.0.1.10:8000;
    #     server 10.0.1.11:8000;
    # }

    server {
        listen 80;
        server_name your_domain.com www.your_domain.com;

        # Serve static files directly
        location /static/ {
            alias /path/to/your/app/static/;
            expires 30d;
            access_log off;
            add_header Cache-Control "public";
        }

        location /media/ {
            alias /path/to/your/app/media/;
            expires 30d;
            access_log off;
            add_header Cache-Control "public";
        }

        location / {
            # If using Gunicorn/uWSGI
            # proxy_pass http://unix:/run/gunicorn.sock; # For Unix socket
            # proxy_pass http://127.0.0.1:8000; # For TCP socket

            # If using PHP-FPM
            # try_files $uri $uri/ /index.php?$query_string;

            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # For Gunicorn/uWSGI
            # proxy_redirect off;
        }

        # For PHP-FPM
        # location ~ \.php$ {
        #     include snippets/fastcgi-php.conf;
        #     fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Adjust version
        #     fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        #     include fastcgi_params;
        # }

        # Deny access to hidden files
        location ~ /\. {
            deny all;
        }
    }

    # Include other server blocks or configurations
    # include /etc/nginx/sites-enabled/*;
}

Key directives to note:

sendfile on;: Enables efficient transfer of files from disk to socket without user-space buffering.
tcp_nopush on; and tcp_nodelay on;: Optimize TCP packet transmission.
keepalive_timeout 65;: Sets the timeout for persistent connections.
gzip_* directives: Enable and configure Gzip compression for text-based responses, significantly reducing bandwidth.
proxy_* directives: Crucial for configuring how Nginx communicates with your backend application server (Gunicorn/uWSGI or PHP-FPM). Adjust timeouts and buffer sizes to prevent issues with slow backend responses or large requests.
client_max_body_size: Limits the size of client request bodies, essential for preventing denial-of-service attacks via large uploads.
server_tokens off;: Hides the Nginx version from HTTP headers, a minor security hardening step.

SSL/TLS Optimization

For HTTPS, Nginx handles SSL termination. Optimizing SSL/TLS involves choosing strong cipher suites, enabling HTTP/2, and configuring session caching.

SSL Configuration Snippet

# In your server block for HTTPS (listen 443 ssl;)

ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
ssl_session_cache shared:SSL:10m; # 10MB shared cache
ssl_session_timeout 10m;
ssl_session_tickets off; # Consider security implications if enabling

# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s; # Use your preferred DNS resolvers
resolver_timeout 5s;

# HTTP/2 support
http2_max_concurrent_streams 100;
http2_push_preload on;

Explanation of key SSL directives:

ssl_protocols: Restricts to modern, secure TLS versions.
ssl_prefer_server_ciphers on;: Ensures the server’s cipher preference is used, not the client’s.
ssl_ciphers: A strong, modern cipher suite list. Use tools like Mozilla SSL Configuration Generator for up-to-date recommendations.
ssl_session_cache and ssl_session_timeout: Enable session resumption, reducing handshake overhead for returning clients.
ssl_session_tickets off;: Disables session tickets, which can be vulnerable if not properly managed. If enabled, consider perfect forward secrecy (PFS) implications.
ssl_stapling and ssl_stapling_verify: Improves performance and security by allowing Nginx to cache OCSP responses from Certificate Authorities.
http2_... directives: Enable HTTP/2 for multiplexing and header compression.

Gunicorn/uWSGI Tuning for Python WSGI Applications

Gunicorn and uWSGI are popular WSGI servers for Python applications. Proper configuration is vital for handling concurrent requests efficiently and managing worker processes.

Gunicorn Worker Configuration

Gunicorn’s concurrency model is based on worker processes. The most common worker types are sync (traditional, one request per worker) and gevent/event (asynchronous, can handle multiple requests per worker using greenlets or event loops).

Choosing the Right Worker Type and Count

For CPU-bound tasks, a synchronous worker model with a number of workers roughly equal to (2 * number_of_cores) + 1 is often recommended. For I/O-bound tasks (e.g., applications making many external API calls or database queries), asynchronous workers like gevent or event can significantly increase concurrency. The number of workers for async models can be much higher, limited by available memory and system resources.

Gunicorn Command-Line Arguments / Configuration File

# Example using command-line arguments:
# gunicorn --workers 4 --worker-class sync --bind unix:/run/gunicorn.sock --timeout 120 myapp.wsgi:application

# Example using a Python configuration file (gunicorn.conf.py):
# /etc/gunicorn.d/myapp.py

import multiprocessing

bind = "unix:/run/gunicorn.sock" # Or "0.0.0.0:8000" for TCP
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "sync" # or "gevent", "event"
timeout = 120 # seconds
keepalive = 2 # seconds

# For gevent worker class:
# worker_connections = 1000

# Logging configuration
loglevel = "info"
accesslog = "/var/log/gunicorn/myapp-access.log"
errorlog = "/var/log/gunicorn/myapp-error.log"

# Other useful settings:
# preload_app = True # Loads application code before workers fork, can speed up startup
# daemon = True # Runs Gunicorn as a daemon (often managed by systemd instead)
# pidfile = "/run/gunicorn.pid"

Explanation:

--workers / workers: Number of worker processes.
--worker-class / worker_class: The type of worker to use.
--bind / bind: The address and port (or Unix socket) to listen on. Using a Unix socket is generally preferred when Nginx is on the same host for performance.
--timeout / timeout: Maximum time (in seconds) a worker can spend processing a request. Crucial for preventing hung requests from blocking workers.
--keepalive / keepalive: Number of seconds to wait for requests on a keep-alive connection.
worker_connections (for async workers): Maximum number of greenlets/tasks a worker can handle concurrently.
Logging directives: Essential for monitoring and debugging. Ensure log rotation is configured for these files.

uWSGI Configuration

uWSGI is another powerful WSGI server, known for its flexibility and performance. It uses a configuration file (often .ini format).

uWSGI Configuration File

# /etc/uwsgi/myapp.ini

[uwsgi]
module = myapp.wsgi:application

# Socket configuration
socket = /run/uwsgi/myapp.sock
# Or TCP socket:
# socket = 127.0.0.1:8000

chmod-socket = 660
vacuum = true # Remove socket on exit

# Worker configuration
processes = 4 # Number of worker processes
threads = 2 # Number of threads per process (if using threaded workers)
# worker_type = async # For asynchronous workers (e.g., gevent, greenlet)

# Performance tuning
buffer-size = 32768
harakiri = 30 # Kill requests taking longer than 30 seconds
max-requests = 5000 # Restart worker after N requests to clear memory leaks

# Logging
logto = /var/log/uwsgi/myapp.log
log-reopen = true

# Other settings
uid = www-data
gid = www-data
chdir = /path/to/your/app
virtualenv = /path/to/your/venv
# enable-threads = true # If using threads
# master = true # Enable master process (recommended)

Explanation:

module: Specifies the WSGI application entry point.
socket: Defines the communication socket. Unix sockets are generally faster.
processes: Number of worker processes. Similar tuning principles to Gunicorn apply.
threads: Number of threads per process. Use if your application is thread-safe and benefits from concurrency within a process.
harakiri: Similar to Gunicorn’s timeout, kills unresponsive requests.
max-requests: A crucial setting to mitigate memory leaks by periodically restarting workers.
chdir and virtualenv: Ensure uWSGI runs in the correct directory and with the correct Python environment.
master = true: Enables the master process, which manages workers and provides features like graceful reloads.

PHP-FPM Tuning for PHP Applications

For PHP applications, PHP-FPM (FastCGI Process Manager) is the standard. Tuning FPM pools is essential for performance and stability.

PHP-FPM Pool Configuration

PHP-FPM pools manage worker processes that handle PHP requests. The key is to balance the number of processes with available system resources.

Tuning `www.conf`

; /etc/php/7.4/fpm/pool.d/www.conf (Adjust PHP version as needed)

[www]
user = www-data
group = www-data
listen = /run/php/php7.4-fpm.sock ; Or TCP: listen = 127.0.0.1:9000

; Process management
; pm = dynamic ; Options: static, dynamic, ondemand
pm = dynamic
pm.max_children = 50      ; Max number of children at any one time
pm.start_servers = 2      ; Number of servers started on boot
pm.min_spare_servers = 1  ; Min number of servers to keep idle
pm.max_spare_servers = 5  ; Max number of servers to keep idle
pm.process_idle_timeout = 10s ; Timeout for idle processes

; Request handling
request_terminate_timeout = 60s ; Max time per request
request_slowlog_timeout = 10s   ; Log requests slower than this
slowlog = /var/log/php/php7.4-fpm-slow.log

; Other useful settings
chdir = /
catch_workers_output = yes
; php_admin_value[memory_limit] = 256M
; php_admin_flag[display_errors] = off

Explanation:

listen: The socket FPM listens on. Unix sockets are preferred for local communication.
pm: Process Manager control.
- static: Keeps a fixed number of children running. Good for predictable loads.
- dynamic: Starts/stops workers based on load. Generally a good default.
- ondemand: Starts workers only when needed. Can save resources but might have higher initial latency.
pm.max_children: The absolute maximum number of child processes that will be spawned. This is the most critical setting for preventing OOM errors. Calculate based on available RAM: (Total RAM - RAM for OS/Nginx/other services) / Average RAM per FPM worker.
pm.start_servers, pm.min_spare_servers, pm.max_spare_servers: These control the dynamic scaling behavior. Tune them to avoid frequent process spawning/killing.
request_terminate_timeout: Similar to WSGI timeouts, prevents runaway scripts.
request_slowlog_timeout and slowlog: Essential for identifying slow PHP scripts.

Elasticsearch Performance Tuning on AWS

Elasticsearch performance is heavily influenced by JVM heap size, disk I/O, and network configuration. On AWS, choosing the right instance type and EBS volume is paramount.

Instance Type and EBS Volumes

For Elasticsearch nodes, especially data nodes, prioritize instances with good I/O performance and sufficient RAM. Instance families like i3, i4i (local NVMe SSDs), or m5d/r5d (local SSDs) are excellent choices. If using EBS, opt for gp3 or io1/io2 volumes. gp3 offers consistent baseline performance and allows independent scaling of IOPS and throughput, making it a cost-effective and performant choice.

JVM Heap Size Configuration

The JVM heap size is arguably the most critical Elasticsearch tuning parameter. It should be set to no more than 50% of the system’s total RAM, and never exceed 30-32GB due to compressed ordinary object pointers (compressed oops).

Setting JVM Heap in `jvm.options`

# /etc/elasticsearch/jvm.options (or similar path)

# Set the maximum heap size
-Xms4g
-Xmx4g

# Set the initial heap size
# -Xms and -Xmx should be the same for production

Example Calculation: On an i3.large instance with 15.25 GiB RAM, setting -Xms7g -Xmx7g (approximately 7 GiB) would be appropriate. This leaves ~8 GiB for the OS, file system cache, and other processes.

Elasticsearch Configuration (`elasticsearch.yml`)

# /etc/elasticsearch/elasticsearch.yml

cluster.name: "my-es-cluster"
node.name: ${HOSTNAME}
network.host: 0.0.0.0 # Or specific private IP for security

# Discovery settings (for multi-node clusters)
discovery.seed_hosts: ["host1", "host2", "host3"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"] # For initial cluster bootstrap

# Shard allocation settings
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%

# Index buffer settings
indices.memory.index_buffer_size: 50% # Default is 10%

# Thread pool settings (adjust based on workload)
thread_pool.write.size: 16 # Default is 8
thread_pool.write.queue_size: 1000 # Default is 200

# Caching
indices.queries.cache.size: 256mb # Default is 0

# For AWS specific settings (e.g., EC2 discovery plugin)
# plugin.security.disabled: false # If using X-Pack Security

Explanation:

network.host: Bind to the appropriate network interface. Use private IPs for internal communication.
discovery.seed_hosts and cluster.initial_master_nodes: Essential for cluster formation.
Disk watermarks: Prevent nodes from running out of disk space by controlling shard allocation.
indices.memory.index_buffer_size: Controls the amount of memory used for indexing buffers. Increasing this can improve indexing performance, but consumes more heap.
thread_pool.write.size and queue_size: Tune write thread pools for higher indexing throughput. Monitor queue rejections.
indices.queries.cache.size: Enable and size the query cache for frequently executed queries.

Swapping Disabled

Elasticsearch is extremely sensitive to swapping. Ensure swapping is disabled on all Elasticsearch nodes.

Disabling Swap

# Check current swap status
sudo swapon --show

# Disable swap temporarily
sudo swapoff -a

# Disable swap permanently by editing /etc/fstab
# Comment out or remove lines related to swap partitions/files
# Example line to comment out:
# /swapfile none swap sw 0 0

# Verify after reboot
sudo swapon --show

File Descriptors and MMap Counts

Elasticsearch requires a high number of open file descriptors and memory map counts.

Configuring Limits

# Add to /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
root soft nofile 65536
root hard nofile 65536

# Add to /etc/sysctl.conf
vm.max_map_count=262144

# Apply sysctl changes
sudo sysctl -p

# Verify limits (after logging out and back in, or rebooting)
ulimit -n
cat /proc/sys/vm/max_map_count

These settings ensure that Elasticsearch can efficiently manage its indices and memory mappings.

The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on AWS for Python

Nginx as a High-Performance Frontend Proxy

Worker Processes and Connections

Tuning nginx.conf

SSL/TLS Optimization

SSL Configuration Snippet

Gunicorn/uWSGI Tuning for Python WSGI Applications

Gunicorn Worker Configuration

Choosing the Right Worker Type and Count

Gunicorn Command-Line Arguments / Configuration File

uWSGI Configuration

uWSGI Configuration File

PHP-FPM Tuning for PHP Applications

PHP-FPM Pool Configuration

Tuning www.conf

Elasticsearch Performance Tuning on AWS

Instance Type and EBS Volumes

JVM Heap Size Configuration

Setting JVM Heap in jvm.options

Elasticsearch Configuration (`elasticsearch.yml`)

Swapping Disabled

Disabling Swap

File Descriptors and MMap Counts

Configuring Limits

Recent Posts

Top Categories

Our Products

Our Services

Tuning `nginx.conf`

Tuning `www.conf`

Setting JVM Heap in `jvm.options`