The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and PostgreSQL on OVH for Python

Optimizing Nginx for High-Traffic Python Applications

When deploying Python web applications, particularly those using frameworks like Django or Flask, Nginx often serves as the reverse proxy and static file server. Fine-tuning Nginx is crucial for handling high traffic volumes efficiently. This section focuses on key Nginx directives and configurations relevant to Python deployments on OVH infrastructure.

Nginx Worker Processes and Connections

The number of worker processes and the maximum number of connections per worker are fundamental to Nginx’s concurrency. A common starting point is to set worker_processes to the number of CPU cores available on your server. For optimal performance, especially on multi-core systems, setting it to auto is often recommended, allowing Nginx to dynamically adjust based on the system’s capabilities.

The worker_connections directive defines the maximum number of simultaneous connections that each worker process can handle. This value should be set high enough to accommodate peak traffic, but not so high that it exhausts system resources. A typical value might be 1024 or 2048, but this should be benchmarked. The total maximum connections will be worker_processes * worker_connections.

Nginx Configuration Snippet

Here’s a sample Nginx configuration snippet demonstrating these settings. Remember to adjust worker_processes if not using auto based on your OVH instance’s CPU count.

worker_processes auto;
# Or, explicitly set based on CPU cores:
# worker_processes 4;

events {
    worker_connections 2048; # Max connections per worker
    multi_accept on;        # Accept multiple connections at once
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;

    keepalive_timeout 65;
    keepalive_requests 1000; # Close connection after N requests

    # Gzip compression for static assets and API responses
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Buffering and timeouts for upstream connections
    proxy_connect_timeout 60s;
    proxy_send_timeout    60s;
    proxy_read_timeout    60s;
    proxy_buffer_size     16k;
    proxy_buffers         4 32k;
    proxy_busy_buffers_size 64k;

    # Enable HTTP/2 for improved performance
    listen 443 ssl http2;
    listen [::]:443 ssl http2;

    # ... other server configurations ...
}

Tuning Gunicorn for Python WSGI Applications

Gunicorn (Green Unicorn) is a popular WSGI HTTP Server for Python. Its performance is heavily influenced by the number of worker processes and the worker type. For CPU-bound tasks, a synchronous worker class is often sufficient. For I/O-bound tasks, asynchronous workers like gevent or eventlet can significantly improve concurrency.

Gunicorn Worker Processes and Threads

The number of worker processes is typically set based on the number of CPU cores. A common recommendation is (2 * number_of_cores) + 1. This formula aims to keep all cores busy, accounting for potential I/O waits. For asynchronous workers, the concept of “threads” is less relevant as they manage concurrency through event loops.

When using synchronous workers (e.g., sync), each worker process handles one request at a time. If your application is I/O-bound (e.g., making many external API calls or database queries), you might consider using gevent workers. With gevent, a single worker process can handle multiple requests concurrently by yielding control when waiting for I/O operations to complete. The number of gevent workers can often be higher than synchronous workers.

Gunicorn Command-Line Configuration

Here’s an example of how to start Gunicorn with optimized settings. This assumes you have a WSGI application object named application in a file named wsgi.py.

# Example for synchronous workers (adjust workers based on CPU cores)
# Assuming 4 CPU cores: (2 * 4) + 1 = 9 workers
gunicorn --workers 9 \
         --worker-class sync \
         --bind 0.0.0.0:8000 \
         --timeout 120 \
         --log-level info \
         wsgi:application

# Example for gevent workers (adjust workers based on expected concurrency)
# This can often handle more concurrent connections per worker
gunicorn --workers 4 \
         --worker-class gevent \
         --bind 0.0.0.0:8000 \
         --timeout 120 \
         --log-level info \
         wsgi:application

Note: The --bind address should typically be 127.0.0.1:8000 when Nginx is on the same server, as Nginx will proxy to this local address. If Gunicorn is on a different machine, adjust accordingly. The --timeout value should be sufficient for your longest expected request, but not excessively long to prevent hung processes.

Tuning PHP-FPM for PHP Applications

For PHP applications, PHP-FPM (FastCGI Process Manager) is the standard way to interface with web servers like Nginx. Optimizing PHP-FPM involves tuning its process management and resource allocation.

PHP-FPM Process Management

PHP-FPM offers several process management strategies: static, dynamic, and ondemand. For production environments, dynamic or static are generally preferred. dynamic is a good balance, starting with a few processes and spawning more as needed, up to a defined maximum. static pre-forks a fixed number of processes, which can be more predictable but less resource-efficient if traffic is highly variable.

Key directives within the PHP-FPM pool configuration (e.g., /etc/php/X.Y/fpm/pool.d/www.conf) include:

pm.max_children: The maximum number of child processes that will be spawned. This is a hard limit and directly impacts memory usage.
pm.start_servers: The number of child processes to start when the FPM master process is started.
pm.min_spare_servers: The desired minimum number of idle supervisor processes.
pm.max_spare_servers: The desired maximum number of idle supervisor processes.
pm.max_requests: The number of requests each child process should execute before respawning. This helps prevent memory leaks.

PHP-FPM Configuration Example

Here’s a sample configuration for a dynamic process manager. Adjust values based on your server’s RAM and expected load. A good starting point for pm.max_children is often (total_RAM_in_MB / average_process_RAM_in_MB), but this requires profiling your PHP processes.

; /etc/php/X.Y/fpm/pool.d/www.conf

[www]
user = www-data
group = www-data
listen = /run/php/phpX.Y-fpm.sock ; Or a TCP socket like 127.0.0.1:9000

; Process Manager settings
pm = dynamic
pm.max_children = 100       ; Adjust based on server RAM and PHP process size
pm.start_servers = 10       ; Initial number of workers
pm.min_spare_servers = 5    ; Minimum idle workers
pm.max_spare_servers = 20   ; Maximum idle workers
pm.max_requests = 500       ; Restart worker after N requests

; Other useful settings
request_terminate_timeout = 120s ; Timeout for script execution
; rlimit_files = 1024
; rlimit_core = 0

Tuning PostgreSQL for High-Performance Data Access

PostgreSQL’s performance is critically dependent on its configuration, especially memory allocation and query optimization. On OVH, where you might have dedicated or VPS instances, tuning these parameters is essential.

Key PostgreSQL Configuration Parameters

The primary configuration file for PostgreSQL is typically postgresql.conf. Key parameters to tune include:

shared_buffers: This is arguably the most important parameter. It defines the amount of memory PostgreSQL can use for caching data. A common recommendation is 25% of your total system RAM, but this can be increased cautiously if you have ample RAM and your OS cache is not starved.
work_mem: Memory used for internal sort operations and hash tables before writing to disk. Insufficient work_mem leads to disk spills, drastically slowing down queries. Increase this if you see “spilling to disk” in EXPLAIN ANALYZE output.
maintenance_work_mem: Memory used for vacuuming, `CREATE INDEX`, and `ALTER TABLE` operations. Larger values can speed up these maintenance tasks.
effective_cache_size: This tells PostgreSQL how much memory is available for disk caching by the operating system and the shared buffer. It helps the query planner make better decisions. A good starting point is 50-75% of total RAM.
max_connections: The maximum number of concurrent connections. Ensure this is high enough for your application’s needs but not so high that it exhausts memory.
wal_buffers: Memory for WAL (Write-Ahead Logging) data. A value of -1 (auto) is often fine, but tuning can help with write-heavy workloads.
checkpoint_completion_target: Controls how spread out checkpoints are. A value of 0.9 is often recommended to spread I/O over time.

PostgreSQL Configuration Snippet

Here’s an example snippet from postgresql.conf. Remember to restart PostgreSQL after making changes.

# postgresql.conf

# Memory settings (assuming 16GB RAM)
shared_buffers = 4GB          # 25% of 16GB RAM
work_mem = 64MB               # Adjust based on query complexity and RAM
maintenance_work_mem = 512MB  # For vacuuming and index creation
effective_cache_size = 12GB   # 75% of 16GB RAM

# Connection settings
max_connections = 200         # Adjust based on application needs and RAM
listen_addresses = '*'        # Or specific IPs if needed

# WAL settings
wal_buffers = 16MB
wal_writer_delay = 200ms
checkpoint_completion_target = 0.9
max_wal_size = 4GB            # Adjust based on disk space and recovery needs

# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H-%M-%S.log'
log_statement = 'ddl'         # Log DDL statements, or 'all' for debugging
log_min_duration_statement = 1000 # Log statements longer than 1s

Query Optimization and Indexing

Beyond server configuration, efficient queries are paramount. Regularly analyze slow queries using EXPLAIN ANALYZE. Ensure appropriate indexes are in place for frequently queried columns, especially those used in WHERE clauses, JOIN conditions, and ORDER BY clauses.

Use tools like pgtune to get initial recommendations based on your hardware, but always validate with real-world load testing.

Monitoring and Iterative Tuning

Performance tuning is not a one-time task. Implement robust monitoring for Nginx, Gunicorn/PHP-FPM, and PostgreSQL. Key metrics include:

Nginx: Request rates, error rates (4xx, 5xx), connection counts, latency.
Gunicorn/PHP-FPM: Worker status, request queue length, response times, CPU/memory usage per worker.
PostgreSQL: Active connections, query execution times, cache hit ratios, disk I/O, CPU/memory usage.

Use tools like Prometheus with Grafana, Datadog, or New Relic. Regularly review these metrics, identify bottlenecks, and iteratively adjust configurations. Load testing with tools like k6 or JMeter is crucial to validate changes before deploying to production.