The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on AWS for Python
Nginx as a High-Performance Frontend Proxy
When deploying Python web applications, Nginx serves as an indispensable frontend proxy, handling static file serving, SSL termination, request buffering, and load balancing. Optimizing Nginx is crucial for maximizing throughput and minimizing latency. We’ll focus on key directives for a production environment, assuming a typical AWS EC2 instance setup.
Worker Processes and Connections
The worker_processes directive controls the number of worker processes Nginx will spawn. A common best practice is to set this to the number of CPU cores available on the server. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be worker_processes * worker_connections. Ensure this value is less than the system’s open file descriptor limit.
Tuning nginx.conf
# /etc/nginx/nginx.conf
user www-data;
worker_processes auto; # Or set to the number of CPU cores, e.g., worker_processes 4;
events {
worker_connections 1024; # Adjust based on expected load and system limits
multi_accept on;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
server_tokens off; # Important for security
# Gzip compression for text-based assets
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
# Buffering and timeouts for upstream connections
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_buffer_size 16k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k;
# Client request body limits
client_max_body_size 50M; # Adjust as needed for file uploads
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Load balancing configuration (if multiple app servers)
# upstream myapp {
# server 10.0.1.10:8000;
# server 10.0.1.11:8000;
# }
server {
listen 80;
server_name your_domain.com www.your_domain.com;
# Serve static files directly
location /static/ {
alias /path/to/your/app/static/;
expires 30d;
access_log off;
add_header Cache-Control "public";
}
location /media/ {
alias /path/to/your/app/media/;
expires 30d;
access_log off;
add_header Cache-Control "public";
}
location / {
# If using Gunicorn/uWSGI
# proxy_pass http://unix:/run/gunicorn.sock; # For Unix socket
# proxy_pass http://127.0.0.1:8000; # For TCP socket
# If using PHP-FPM
# try_files $uri $uri/ /index.php?$query_string;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# For Gunicorn/uWSGI
# proxy_redirect off;
}
# For PHP-FPM
# location ~ \.php$ {
# include snippets/fastcgi-php.conf;
# fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Adjust version
# fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
# include fastcgi_params;
# }
# Deny access to hidden files
location ~ /\. {
deny all;
}
}
# Include other server blocks or configurations
# include /etc/nginx/sites-enabled/*;
}
Key directives to note:
sendfile on;: Enables efficient transfer of files from disk to socket without user-space buffering.tcp_nopush on;andtcp_nodelay on;: Optimize TCP packet transmission.keepalive_timeout 65;: Sets the timeout for persistent connections.gzip_*directives: Enable and configure Gzip compression for text-based responses, significantly reducing bandwidth.proxy_*directives: Crucial for configuring how Nginx communicates with your backend application server (Gunicorn/uWSGI or PHP-FPM). Adjust timeouts and buffer sizes to prevent issues with slow backend responses or large requests.client_max_body_size: Limits the size of client request bodies, essential for preventing denial-of-service attacks via large uploads.server_tokens off;: Hides the Nginx version from HTTP headers, a minor security hardening step.
SSL/TLS Optimization
For HTTPS, Nginx handles SSL termination. Optimizing SSL/TLS involves choosing strong cipher suites, enabling HTTP/2, and configuring session caching.
SSL Configuration Snippet
# In your server block for HTTPS (listen 443 ssl;) ssl_protocols TLSv1.2 TLSv1.3; ssl_prefer_server_ciphers on; ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384; ssl_session_cache shared:SSL:10m; # 10MB shared cache ssl_session_timeout 10m; ssl_session_tickets off; # Consider security implications if enabling # OCSP Stapling ssl_stapling on; ssl_stapling_verify on; resolver 8.8.8.8 8.8.4.4 valid=300s; # Use your preferred DNS resolvers resolver_timeout 5s; # HTTP/2 support http2_max_concurrent_streams 100; http2_push_preload on;
Explanation of key SSL directives:
ssl_protocols: Restricts to modern, secure TLS versions.ssl_prefer_server_ciphers on;: Ensures the server’s cipher preference is used, not the client’s.ssl_ciphers: A strong, modern cipher suite list. Use tools like Mozilla SSL Configuration Generator for up-to-date recommendations.ssl_session_cacheandssl_session_timeout: Enable session resumption, reducing handshake overhead for returning clients.ssl_session_tickets off;: Disables session tickets, which can be vulnerable if not properly managed. If enabled, consider perfect forward secrecy (PFS) implications.ssl_staplingandssl_stapling_verify: Improves performance and security by allowing Nginx to cache OCSP responses from Certificate Authorities.http2_...directives: Enable HTTP/2 for multiplexing and header compression.
Gunicorn/uWSGI Tuning for Python WSGI Applications
Gunicorn and uWSGI are popular WSGI servers for Python applications. Proper configuration is vital for handling concurrent requests efficiently and managing worker processes.
Gunicorn Worker Configuration
Gunicorn’s concurrency model is based on worker processes. The most common worker types are sync (traditional, one request per worker) and gevent/event (asynchronous, can handle multiple requests per worker using greenlets or event loops).
Choosing the Right Worker Type and Count
For CPU-bound tasks, a synchronous worker model with a number of workers roughly equal to (2 * number_of_cores) + 1 is often recommended. For I/O-bound tasks (e.g., applications making many external API calls or database queries), asynchronous workers like gevent or event can significantly increase concurrency. The number of workers for async models can be much higher, limited by available memory and system resources.
Gunicorn Command-Line Arguments / Configuration File
# Example using command-line arguments: # gunicorn --workers 4 --worker-class sync --bind unix:/run/gunicorn.sock --timeout 120 myapp.wsgi:application # Example using a Python configuration file (gunicorn.conf.py): # /etc/gunicorn.d/myapp.py import multiprocessing bind = "unix:/run/gunicorn.sock" # Or "0.0.0.0:8000" for TCP workers = multiprocessing.cpu_count() * 2 + 1 worker_class = "sync" # or "gevent", "event" timeout = 120 # seconds keepalive = 2 # seconds # For gevent worker class: # worker_connections = 1000 # Logging configuration loglevel = "info" accesslog = "/var/log/gunicorn/myapp-access.log" errorlog = "/var/log/gunicorn/myapp-error.log" # Other useful settings: # preload_app = True # Loads application code before workers fork, can speed up startup # daemon = True # Runs Gunicorn as a daemon (often managed by systemd instead) # pidfile = "/run/gunicorn.pid"
Explanation:
--workers/workers: Number of worker processes.--worker-class/worker_class: The type of worker to use.--bind/bind: The address and port (or Unix socket) to listen on. Using a Unix socket is generally preferred when Nginx is on the same host for performance.--timeout/timeout: Maximum time (in seconds) a worker can spend processing a request. Crucial for preventing hung requests from blocking workers.--keepalive/keepalive: Number of seconds to wait for requests on a keep-alive connection.worker_connections(for async workers): Maximum number of greenlets/tasks a worker can handle concurrently.- Logging directives: Essential for monitoring and debugging. Ensure log rotation is configured for these files.
uWSGI Configuration
uWSGI is another powerful WSGI server, known for its flexibility and performance. It uses a configuration file (often .ini format).
uWSGI Configuration File
# /etc/uwsgi/myapp.ini [uwsgi] module = myapp.wsgi:application # Socket configuration socket = /run/uwsgi/myapp.sock # Or TCP socket: # socket = 127.0.0.1:8000 chmod-socket = 660 vacuum = true # Remove socket on exit # Worker configuration processes = 4 # Number of worker processes threads = 2 # Number of threads per process (if using threaded workers) # worker_type = async # For asynchronous workers (e.g., gevent, greenlet) # Performance tuning buffer-size = 32768 harakiri = 30 # Kill requests taking longer than 30 seconds max-requests = 5000 # Restart worker after N requests to clear memory leaks # Logging logto = /var/log/uwsgi/myapp.log log-reopen = true # Other settings uid = www-data gid = www-data chdir = /path/to/your/app virtualenv = /path/to/your/venv # enable-threads = true # If using threads # master = true # Enable master process (recommended)
Explanation:
module: Specifies the WSGI application entry point.socket: Defines the communication socket. Unix sockets are generally faster.processes: Number of worker processes. Similar tuning principles to Gunicorn apply.threads: Number of threads per process. Use if your application is thread-safe and benefits from concurrency within a process.harakiri: Similar to Gunicorn’s timeout, kills unresponsive requests.max-requests: A crucial setting to mitigate memory leaks by periodically restarting workers.chdirandvirtualenv: Ensure uWSGI runs in the correct directory and with the correct Python environment.master = true: Enables the master process, which manages workers and provides features like graceful reloads.
PHP-FPM Tuning for PHP Applications
For PHP applications, PHP-FPM (FastCGI Process Manager) is the standard. Tuning FPM pools is essential for performance and stability.
PHP-FPM Pool Configuration
PHP-FPM pools manage worker processes that handle PHP requests. The key is to balance the number of processes with available system resources.
Tuning www.conf
; /etc/php/7.4/fpm/pool.d/www.conf (Adjust PHP version as needed) [www] user = www-data group = www-data listen = /run/php/php7.4-fpm.sock ; Or TCP: listen = 127.0.0.1:9000 ; Process management ; pm = dynamic ; Options: static, dynamic, ondemand pm = dynamic pm.max_children = 50 ; Max number of children at any one time pm.start_servers = 2 ; Number of servers started on boot pm.min_spare_servers = 1 ; Min number of servers to keep idle pm.max_spare_servers = 5 ; Max number of servers to keep idle pm.process_idle_timeout = 10s ; Timeout for idle processes ; Request handling request_terminate_timeout = 60s ; Max time per request request_slowlog_timeout = 10s ; Log requests slower than this slowlog = /var/log/php/php7.4-fpm-slow.log ; Other useful settings chdir = / catch_workers_output = yes ; php_admin_value[memory_limit] = 256M ; php_admin_flag[display_errors] = off
Explanation:
listen: The socket FPM listens on. Unix sockets are preferred for local communication.pm: Process Manager control.static: Keeps a fixed number of children running. Good for predictable loads.dynamic: Starts/stops workers based on load. Generally a good default.ondemand: Starts workers only when needed. Can save resources but might have higher initial latency.
pm.max_children: The absolute maximum number of child processes that will be spawned. This is the most critical setting for preventing OOM errors. Calculate based on available RAM:(Total RAM - RAM for OS/Nginx/other services) / Average RAM per FPM worker.pm.start_servers,pm.min_spare_servers,pm.max_spare_servers: These control the dynamic scaling behavior. Tune them to avoid frequent process spawning/killing.request_terminate_timeout: Similar to WSGI timeouts, prevents runaway scripts.request_slowlog_timeoutandslowlog: Essential for identifying slow PHP scripts.
Elasticsearch Performance Tuning on AWS
Elasticsearch performance is heavily influenced by JVM heap size, disk I/O, and network configuration. On AWS, choosing the right instance type and EBS volume is paramount.
Instance Type and EBS Volumes
For Elasticsearch nodes, especially data nodes, prioritize instances with good I/O performance and sufficient RAM. Instance families like i3, i4i (local NVMe SSDs), or m5d/r5d (local SSDs) are excellent choices. If using EBS, opt for gp3 or io1/io2 volumes. gp3 offers consistent baseline performance and allows independent scaling of IOPS and throughput, making it a cost-effective and performant choice.
JVM Heap Size Configuration
The JVM heap size is arguably the most critical Elasticsearch tuning parameter. It should be set to no more than 50% of the system’s total RAM, and never exceed 30-32GB due to compressed ordinary object pointers (compressed oops).
Setting JVM Heap in jvm.options
# /etc/elasticsearch/jvm.options (or similar path) # Set the maximum heap size -Xms4g -Xmx4g # Set the initial heap size # -Xms and -Xmx should be the same for production
Example Calculation: On an i3.large instance with 15.25 GiB RAM, setting -Xms7g -Xmx7g (approximately 7 GiB) would be appropriate. This leaves ~8 GiB for the OS, file system cache, and other processes.
Elasticsearch Configuration (`elasticsearch.yml`)
# /etc/elasticsearch/elasticsearch.yml
cluster.name: "my-es-cluster"
node.name: ${HOSTNAME}
network.host: 0.0.0.0 # Or specific private IP for security
# Discovery settings (for multi-node clusters)
discovery.seed_hosts: ["host1", "host2", "host3"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"] # For initial cluster bootstrap
# Shard allocation settings
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%
# Index buffer settings
indices.memory.index_buffer_size: 50% # Default is 10%
# Thread pool settings (adjust based on workload)
thread_pool.write.size: 16 # Default is 8
thread_pool.write.queue_size: 1000 # Default is 200
# Caching
indices.queries.cache.size: 256mb # Default is 0
# For AWS specific settings (e.g., EC2 discovery plugin)
# plugin.security.disabled: false # If using X-Pack Security
Explanation:
network.host: Bind to the appropriate network interface. Use private IPs for internal communication.discovery.seed_hostsandcluster.initial_master_nodes: Essential for cluster formation.- Disk watermarks: Prevent nodes from running out of disk space by controlling shard allocation.
indices.memory.index_buffer_size: Controls the amount of memory used for indexing buffers. Increasing this can improve indexing performance, but consumes more heap.thread_pool.write.sizeandqueue_size: Tune write thread pools for higher indexing throughput. Monitor queue rejections.indices.queries.cache.size: Enable and size the query cache for frequently executed queries.
Swapping Disabled
Elasticsearch is extremely sensitive to swapping. Ensure swapping is disabled on all Elasticsearch nodes.
Disabling Swap
# Check current swap status sudo swapon --show # Disable swap temporarily sudo swapoff -a # Disable swap permanently by editing /etc/fstab # Comment out or remove lines related to swap partitions/files # Example line to comment out: # /swapfile none swap sw 0 0 # Verify after reboot sudo swapon --show
File Descriptors and MMap Counts
Elasticsearch requires a high number of open file descriptors and memory map counts.
Configuring Limits
# Add to /etc/security/limits.conf * soft nofile 65536 * hard nofile 65536 root soft nofile 65536 root hard nofile 65536 # Add to /etc/sysctl.conf vm.max_map_count=262144 # Apply sysctl changes sudo sysctl -p # Verify limits (after logging out and back in, or rebooting) ulimit -n cat /proc/sys/vm/max_map_count
These settings ensure that Elasticsearch can efficiently manage its indices and memory mappings.