The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on OVH for Perl

Nginx Tuning for High-Traffic Perl Applications on OVH

Optimizing Nginx is paramount for any high-traffic web application. For Perl-based applications served via Gunicorn or PHP-FPM, specific Nginx directives can significantly improve throughput and reduce latency on OVH infrastructure. We’ll focus on connection handling, caching, and request buffering.

Connection Handling: Worker Processes and Connections

The `worker_processes` directive dictates how many worker processes Nginx spawns. A common recommendation is to set it to the number of CPU cores available. On OVH instances, identifying the core count is straightforward. For `worker_connections`, this defines the maximum number of simultaneous connections a single worker process can handle. The total maximum connections will be `worker_processes * worker_connections`.

To determine the number of CPU cores on a typical OVH Linux instance:

grep -c ^processor /proc/cpuinfo

A good starting point for `worker_processes` is the output of the above command. For `worker_connections`, consider the expected peak concurrent users and the nature of your application’s requests. A value of 1024 or 2048 is often a safe bet, but this should be monitored and adjusted under load.

Here’s an example Nginx configuration snippet:

# In nginx.conf, typically within the 'main' context
user www-data;
worker_processes auto; # Or set to the number of CPU cores
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 2048; # Adjust based on expected load
    multi_accept on;
}

http {
    # ... other http directives ...
}

Optimizing Request Buffering

Nginx buffers client requests. If a request body is larger than `client_body_buffer_size`, Nginx writes it to a temporary file. Large uploads or complex POST requests can benefit from tuning these directives. However, excessively large buffers can consume memory. `client_max_body_size` limits the maximum accepted body size of an HTTP request.

http {
    # ...
    client_body_buffer_size 128k; # Default is 16k, increase if large POSTs are common
    client_max_body_size 50m;    # Adjust based on maximum expected upload size
    # ...
}

Gunicorn/uWSGI Tuning for Perl Applications

When serving Perl applications with Gunicorn (or uWSGI, which is also common), the number of worker processes and threads is critical. Gunicorn’s worker types (sync, gevent, eventlet) have different performance characteristics. For CPU-bound Perl code, more worker processes are generally better than threads. For I/O-bound applications, gevent or eventlet with threads can be beneficial.

A common Gunicorn command-line invocation for Perl might look like this:

gunicorn --workers 4 --threads 2 --bind 0.0.0.0:8000 myapp.wsgi:application

Here, `–workers` is set to 4, and `–threads` to 2. The optimal ratio depends heavily on the application’s workload. A good starting point for `–workers` is `(2 * number_of_cpu_cores) + 1`. This formula aims to keep CPU cores busy while accounting for potential I/O waits.

For applications that are heavily I/O bound (e.g., making many external API calls or database queries), consider using the `gevent` worker class:

gunicorn --worker-class gevent --workers 4 --threads 2 --bind 0.0.0.0:8000 myapp.wsgi:application

Ensure your Perl application is compatible with gevent’s monkey patching if you choose this route.

PHP-FPM Tuning for Perl Applications (if applicable)

If your “Perl” application is actually a hybrid or uses PHP components, tuning PHP-FPM is essential. The `pm` (process manager) settings are key. `pm = dynamic` is often a good balance, allowing FPM to scale workers up and down based on demand. `pm.max_children` is the hard limit on the number of child processes. `pm.start_servers`, `pm.min_spare_servers`, and `pm.max_spare_servers` control the dynamic scaling behavior.

A typical `php-fpm.conf` or pool configuration (`www.conf`) snippet:

; In /etc/php/X.Y/fpm/pool.d/www.conf
[www]
user = www-data
group = www-data
listen = /run/php/phpX.Y-fpm.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0660

pm = dynamic
pm.max_children = 100       ; Adjust based on available RAM and CPU
pm.start_servers = 10       ; Initial number of children
pm.min_spare_servers = 5    ; Minimum idle children
pm.max_spare_servers = 20   ; Maximum idle children
pm.process_idle_timeout = 10s ; How long an idle process stays alive

request_terminate_timeout = 300s ; Max execution time for a script
request_slowlog_timeout = 60s    ; Log scripts exceeding this time
slowlog = /var/log/php/phpX.Y-fpm-slow.log

The `pm.max_children` value should be carefully chosen. A common formula is `(Total RAM – RAM used by OS and other services) / Average child process size`. Monitor memory usage closely. If `pm.max_children` is too high, you’ll experience OOM killer events. If too low, requests will queue up.

Elasticsearch Tuning for Logging and Metrics

For logging and metrics aggregation, Elasticsearch performance is crucial. On OVH, especially with smaller instances, resource contention is common. Key areas for tuning include JVM heap size, indexing settings, and query caching.

JVM Heap Size

Elasticsearch runs on the JVM. Allocating too much or too little heap can cripple performance. The general rule is to set the heap size to no more than 50% of the system’s total RAM, and never exceed 30-32GB due to compressed ordinary object pointers (compressed oops). This is configured in jvm.options.

# In /etc/elasticsearch/jvm.options
-Xms4g
-Xmx4g
# Adjust 4g based on your OVH instance RAM (e.g., 8g, 16g)
# Ensure it's <= 50% of total RAM and < 32GB

Indexing Performance

For high-volume indexing (e.g., from logs), optimize your index settings. Using `_bulk` API is mandatory. Consider disabling `_source` if you don't need to retrieve the original document, or use `_source_includes`/`_source_excludes`. Translog durability (`index.translog.durability`) can be set to `async` for higher throughput at the cost of slightly increased risk of data loss during a crash.

PUT /my-logs-index
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "refresh_interval": "5s",
      "translog.durability": "async"
    }
  },
  "mappings": {
    "properties": {
      "@timestamp": {"type": "date"},
      "message": {"type": "text"},
      "level": {"type": "keyword"}
      // ... other fields
    }
  }
}

The `refresh_interval` controls how often data becomes searchable. A longer interval (e.g., `30s` or `60s`) improves indexing speed but increases search latency. For real-time search, `1s` or `5s` is common.

Query Performance

For read-heavy workloads, ensure your queries are efficient. Use `keyword` fields for exact matches and aggregations instead of `text` fields. Elasticsearch's request cache can be beneficial for frequently executed, identical queries. Ensure it's enabled and appropriately sized.

# In elasticsearch.yml
indices.queries.cache.size: 50%

Monitoring Elasticsearch with tools like Metricbeat and Kibana is essential to identify bottlenecks and validate tuning efforts. Pay close attention to JVM heap usage, garbage collection activity, indexing latency, and search latency.

OVH Specific Considerations

OVH instances, particularly the "Public Cloud" offerings, can have varying network performance and disk I/O capabilities. Always benchmark your configurations under realistic load. Use tools like `ab` (ApacheBench), `wrk`, or Locust for load testing. For Elasticsearch, ensure you're using instances with appropriate disk types (e.g., NVMe SSDs) if I/O is a bottleneck.

Regularly review system logs (`/var/log/nginx/error.log`, `/var/log/phpX.Y-fpm.log`, `/var/log/elasticsearch/`) for errors and warnings. Implement robust monitoring and alerting using Prometheus/Grafana or similar stacks to catch performance regressions before they impact users.