The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on OVH for PHP

Nginx as a High-Performance Frontend for PHP Applications

When deploying PHP applications, especially those leveraging modern frameworks and APIs, Nginx serves as an exceptionally efficient frontend. Its asynchronous, event-driven architecture excels at handling a high volume of concurrent connections, offloading the heavy lifting from your application servers. This section details critical Nginx tuning parameters for optimal PHP performance on OVH infrastructure.

Nginx Worker Processes and Connections

The `worker_processes` directive dictates how many worker processes Nginx will spawn. A common recommendation is to set this to the number of CPU cores available on your server. For OVH instances, this is easily discoverable. The `worker_connections` directive sets the maximum number of simultaneous connections that each worker process can handle. The total theoretical maximum connections is `worker_processes * worker_connections`.

Edit your main Nginx configuration file, typically located at /etc/nginx/nginx.conf:

user www-data;
worker_processes auto; # Or set to the number of CPU cores, e.g., 4
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 4096; # Adjust based on expected load and server RAM
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # ... other http configurations
}

Tuning Tip: On OVH’s dedicated servers, you can determine the number of CPU cores using nproc or lscpu. For virtual private servers (VPS), `auto` is often a good starting point, allowing Nginx to dynamically adjust. The `worker_connections` value should be balanced against available RAM; excessively high values can lead to memory exhaustion. A value of 4096 is a robust default for many scenarios.

Gzip Compression and Caching

Enabling Gzip compression significantly reduces the bandwidth required to transfer assets, leading to faster page loads. Browser caching via HTTP headers is also crucial for repeat visitors.

http {
    # ... other http configurations

    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;

    # Browser Caching
    location ~* \.(css|js|jpg|jpeg|png|gif|ico|svg|woff|woff2|ttf|eot)$ {
        expires 30d;
        add_header Cache-Control "public, no-transform";
    }

    # ... other http configurations
}

Tuning Tip: gzip_comp_level from 1 to 9; 6 offers a good balance between compression ratio and CPU usage. Ensure `gzip_types` includes all relevant MIME types for your application’s static assets.

FastCGI/PHP-FPM Configuration for PHP Applications

For PHP applications, Nginx typically communicates with PHP via PHP-FPM (FastCGI Process Manager). Optimizing PHP-FPM is paramount for application responsiveness. The key is to configure the process manager’s pool settings appropriately.

PHP-FPM Process Manager Settings

PHP-FPM offers several process management strategies: static, dynamic, and ondemand. For production environments, dynamic or static are generally preferred. dynamic is often a good compromise, scaling processes based on load but with a fixed minimum.

Locate your PHP-FPM pool configuration file. This is typically found in /etc/php/[version]/fpm/pool.d/www.conf (replace [version] with your PHP version, e.g., 8.1).

; Choose one of: "static", "dynamic", "ondemand"
pm = dynamic

; For pm = dynamic:
; the pm.max_children, pm.start_servers, pm.min_spare_servers, and pm.max_spare_servers
; settings are used.
pm.max_children = 150       ; Maximum number of children that can be started.
pm.start_servers = 10       ; Number of children created on startup.
pm.min_spare_servers = 5    ; Minimum number of children to keep always running.
pm.max_spare_servers = 20   ; Maximum number of children to keep always running.

; For pm = static:
; pm.max_children = 50

; For pm = ondemand:
; pm.max_children = 50
; pm.process_idle_timeout = 10s;

; The following options are available with all value of pm
; Default value: 0 (unlimited)
; pm.max_requests = 500     ; Maximum number of requests each child process can serve before respawning.
; pm.process_max_idle_time = 10s;

; Other important settings:
; request_terminate_timeout = 0 ; Set to a value like 30s for long-running requests
; listen.owner = www-data
; listen.group = www-data
; listen.mode = 0660
; listen = /run/php/php8.1-fpm.sock

Tuning Tip: The values for pm.max_children, pm.start_servers, pm.min_spare_servers, and pm.max_spare_servers are highly dependent on your server’s RAM and the typical memory footprint of your PHP application. A common starting point for pm.max_children is to calculate based on available RAM: (Total RAM - RAM for OS/Nginx) / Average PHP Process Memory Footprint. Monitor your server’s memory usage (e.g., with htop or free -m) and adjust these values iteratively. pm.max_requests helps prevent memory leaks by respawning processes after a certain number of requests.

Nginx Configuration for PHP-FPM Upstream

Ensure your Nginx site configuration correctly points to your PHP-FPM socket or IP address. This is typically within the location ~ \.php$ block.

server {
    # ... other server configurations

    location ~ \.php$ {
        include snippets/fastcgi-php.conf;
        # With php-fpm (or other unix sockets):
        fastcgi_pass unix:/run/php/php8.1-fpm.sock;
        # Or with TCP/IP:
        # fastcgi_pass 127.0.0.1:9000;
    }

    # ... other server configurations
}

Gunicorn/Python WSGI Server Tuning

If your application is Python-based and uses Gunicorn as the WSGI HTTP Server, similar principles of worker management apply. Gunicorn’s worker types and counts are critical for performance.

Gunicorn Worker Processes and Threads

Gunicorn supports several worker types: sync (synchronous, default), eventlet, gevent, and tornado. For most CPU-bound or I/O-bound applications, sync workers are a solid choice. The number of workers and threads (if using a threaded worker type like sync with threads enabled) needs careful tuning.

A common recommendation for the number of workers is (2 * Number of CPU Cores) + 1. For threaded workers, the number of threads per worker is also important.

# Example Gunicorn command line
gunicorn --workers 3 --threads 2 --bind 0.0.0.0:8000 myapp.wsgi:application

# Or via a systemd service file:
# /etc/systemd/system/myapp.service
[Unit]
Description=Gunicorn instance to serve myapp
After=network.target

[Service]
User=myappuser
Group=myappuser
WorkingDirectory=/path/to/your/app
ExecStart=/path/to/your/venv/bin/gunicorn --workers 3 --threads 2 --bind unix:/run/myapp.sock myapp.wsgi:application
# Or for TCP binding:
# ExecStart=/path/to/your/venv/bin/gunicorn --workers 3 --threads 2 --bind 0.0.0.0:8000 myapp.wsgi:application

[Install]
WantedBy=multi-user.target

Tuning Tip: Start with workers = (2 * CPU cores) + 1. If your application is heavily I/O bound (e.g., database queries, external API calls), increasing the number of threads per worker (e.g., --threads 2 or --threads 4) can improve concurrency without significantly increasing memory usage per worker. Monitor CPU and memory usage closely. If CPU is consistently at 100%, you might have too many workers. If memory is maxed out, reduce workers or threads.

Gunicorn Worker Type Considerations

While sync is the default, if your application has many long-polling or asynchronous operations, consider gevent or eventlet. These use green threads for concurrency. However, they require careful management of blocking I/O calls.

# Using gevent workers
gunicorn --worker-class gevent --workers 3 --bind 0.0.0.0:8000 myapp.wsgi:application

Elasticsearch Performance Tuning on OVH

Elasticsearch, while powerful, can be resource-intensive. Proper JVM heap sizing and OS-level tuning are critical for stability and performance, especially on OVH infrastructure where you might be managing dedicated resources.

JVM Heap Size Configuration

The most crucial Elasticsearch setting is the JVM heap size. It should be set to no more than 50% of the total system RAM, and never exceed 30-32GB due to compressed ordinary object pointers (compressed oops).

Edit the Elasticsearch JVM options file, typically /etc/elasticsearch/jvm.options:

-Xms4g
-Xmx4g

Tuning Tip: For an OVH server with 16GB RAM, setting -Xms4g -Xmx4g (4GB heap) is a reasonable starting point. If you have 64GB RAM, you might set it to -Xms16g -Xmx16g. Always leave ample RAM for the OS and file system cache. After changing this, restart Elasticsearch: sudo systemctl restart elasticsearch.

File Descriptors and MMap Counts

Elasticsearch relies heavily on file system operations and memory mapping. Increasing the limits for open file descriptors and mmap counts is essential.

Edit the /etc/security/limits.conf file and create a file in /etc/security/limits.d/:

# /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
root soft nofile 65536
root hard nofile 65536

# /etc/security/limits.d/99-elasticsearch.conf
elasticsearch - nofile 65536
elasticsearch - memlock unlimited

You also need to configure Elasticsearch to use these limits. Edit /etc/elasticsearch/jvm.options and uncomment/add:

-XX:MaxDirectMemorySize=1073741824 # Example: 1GB, adjust as needed
# Ensure these are uncommented or added if not present
bootstrap.memory_lock: true

And in /etc/elasticsearch/elasticsearch.yml:

bootstrap.system_log: true
bootstrap.memory_lock: true

Tuning Tip: After applying these changes, you must reboot the server or at least log out and back in for the limits.conf changes to take effect for the Elasticsearch user. Verify limits with ulimit -n for the Elasticsearch user. The MaxDirectMemorySize should be at least equal to the JVM heap size.

Elasticsearch Indexing and Sharding Strategy

An appropriate sharding strategy is crucial for both performance and scalability. Too many shards can overwhelm the cluster, while too few can limit parallelism.

Rule of Thumb: Aim for shard sizes between 10GB and 50GB. Avoid having more than 1000 shards per GB of heap across your cluster. For a cluster with 16GB heap, this means no more than 16,000 shards total.

{
  "index": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Tuning Tip: When creating new indices, explicitly set number_of_shards and number_of_replicas. For time-series data, consider using Index Lifecycle Management (ILM) to manage indices and their shards automatically. On OVH, ensure your data disks are performant (e.g., SSDs) for optimal indexing and search throughput.

Monitoring and Iterative Tuning

The key to successful infrastructure tuning is continuous monitoring and iterative adjustments. Utilize tools like Prometheus, Grafana, Netdata, or even basic system utilities (top, htop, vmstat, iostat) to observe resource utilization. Pay close attention to CPU load, memory usage, disk I/O, and network traffic. For PHP-FPM, monitor the number of active, idle, and queue processes. For Gunicorn, observe worker utilization. For Elasticsearch, monitor JVM heap usage, CPU, disk I/O, and query latency.

Make one significant change at a time, monitor its impact, and then decide on the next step. This systematic approach prevents introducing instability and helps pinpoint the exact cause of performance bottlenecks.