The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and PostgreSQL on Google Cloud for Shopify
Nginx Configuration for High-Traffic Shopify Deployments
Optimizing Nginx is paramount for handling the spiky traffic patterns characteristic of Shopify stores, especially during sales events. Our focus here is on tuning Nginx as a reverse proxy to Gunicorn (for Python/Django/Flask) or PHP-FPM (for PHP applications), and serving static assets efficiently. We’ll assume a Google Cloud Compute Engine instance or Google Kubernetes Engine (GKE) deployment.
Worker Processes and Connections
The `worker_processes` directive controls how many worker processes Nginx will spawn. A common recommendation is to set this to the number of CPU cores available. For `worker_connections`, this defines the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be `worker_processes * worker_connections`.
On a typical Google Cloud instance with 4 vCPUs, a good starting point is:
worker_processes 4; # Or auto, if Nginx supports it and you prefer dynamic scaling based on CPU cores
events {
worker_connections 4096; # Adjust based on expected load and system limits
multi_accept on;
}
Keepalive Connections
Enabling keepalive connections reduces the overhead of establishing new TCP connections for subsequent requests from the same client. This is crucial for performance.
http {
# ... other http directives ...
keepalive_timeout 65; # Time to keep alive connections open
keepalive_requests 1000; # Max requests per keepalive connection
# ... rest of http config ...
}
Buffering and Timeouts
Tuning buffer sizes and timeouts prevents Nginx from being overwhelmed by slow clients or upstream servers, and also avoids premature connection closures.
http {
# ...
client_body_buffer_size 128k; # Buffer for client request body
client_header_buffer_size 1k; # Buffer for client request header
large_client_header_buffers 4 128k; # Buffers for large client request headers
send_timeout 60s; # Timeout for sending a response to the client
client_header_timeout 10s; # Timeout for reading client request headers
client_body_timeout 10s; # Timeout for reading client request body
lingering_close on; # Allows closing connection gracefully
lingering_time 30s; # Time to wait for lingering close
proxy_connect_timeout 60s; # Timeout for establishing connection with upstream
proxy_send_timeout 60s; # Timeout for sending request to upstream
proxy_read_timeout 60s; # Timeout for reading response from upstream
# ...
}
Gzip Compression
Enabling Gzip compression significantly reduces the amount of data transferred, improving page load times. Ensure your upstream application also handles compression appropriately or disable it there to avoid double compression.
http {
# ...
gzip on;
gzip_vary on; # Adds "Vary: Accept-Encoding" header
gzip_proxied any; # Compress responses for proxied requests
gzip_comp_level 6; # Compression level (1-9)
gzip_buffers 16 8k; # Number and size of buffers
gzip_http_version 1.1; # Minimum HTTP version
gzip_types text/plain text/css application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml; # MIME types to compress
# ...
}
Static File Serving and Caching
Offload static asset serving to Nginx and configure aggressive browser caching. This is critical for performance as it reduces load on your application servers.
server {
# ...
location /static/ {
alias /path/to/your/static/files/; # Or root directive
expires 365d; # Cache for 1 year
add_header Cache-Control "public, immutable";
access_log off; # Optionally disable access logs for static files
}
location /media/ {
alias /path/to/your/media/files/;
expires 30d; # Cache for 30 days
add_header Cache-Control "public";
access_log off;
}
# ...
}
Gunicorn Tuning for Python Applications
Gunicorn (Green Unicorn) is a popular WSGI HTTP Server for Python. Proper tuning of its worker processes and threads is essential for handling concurrent requests efficiently.
Worker Types and Counts
Gunicorn offers several worker types. The most common are:
- Sync Workers: The default. Each worker handles one request at a time. Good for I/O bound applications.
- Async Workers (e.g., Gevent, Eventlet): Can handle multiple requests concurrently within a single worker process using green threads. Excellent for I/O bound applications with many concurrent connections.
- Gthread Workers: Uses threads to handle multiple requests concurrently.
For CPU-bound tasks, a sync worker might be sufficient. For I/O-bound tasks (common in web applications interacting with databases or external APIs), async workers are generally preferred. A common starting point for sync workers is (2 * number_of_cores) + 1. For async workers, you might use fewer worker processes but a higher number of green threads per worker.
# Example using sync workers gunicorn --workers 3 --bind 0.0.0.0:8000 myapp.wsgi:application # Example using gevent workers (requires gevent installed: pip install gevent) gunicorn --worker-class gevent --workers 1 --threads 100 --bind 0.0.0.0:8000 myapp.wsgi:application
The --threads option is only applicable to worker classes that support threading (like gevent or gthread). For gevent, a high number of threads (e.g., 100-1000) is common, as green threads are very lightweight. The number of worker processes should be kept low (often 1 or 2) when using many threads.
Timeouts and Graceful Reloads
Setting appropriate timeouts prevents workers from being stuck indefinitely on slow requests. Graceful reloads allow for zero-downtime deployments.
gunicorn --workers 4 \
--bind 0.0.0.0:8000 \
--timeout 120 \
--graceful-timeout 120 \
--reload \
myapp.wsgi:application
--timeout: The number of seconds to wait for a worker to process a request. If the timeout is reached, the worker is killed and a new one is spawned. --graceful-timeout: The number of seconds to wait for graceful shutdown of workers during reloads. This allows existing requests to complete.
PHP-FPM Tuning for PHP Applications
PHP-FPM (FastCGI Process Manager) is the standard way to run PHP applications with Nginx. Its configuration heavily influences PHP application performance and resource utilization.
Process Manager Settings
The core of PHP-FPM tuning lies in the pm (Process Manager) settings. These control how PHP worker processes are managed.
; /etc/php/X.Y/fpm/pool.d/www.conf (or similar path) [www] user = www-data group = www-data listen = /run/php/phpX.Y-fpm.sock ; Or a TCP socket like 127.0.0.1:9000 ; Process Manager settings pm = dynamic pm.max_children = 50 ; Max number of children at any one time pm.start_servers = 5 ; Number of children when pm becomes idle pm.min_spare_servers = 2 ; Min number of idle respawns pm.max_spare_servers = 10 ; Max number of idle respawns pm.process_idle_timeout = 10s ; How long an idle process waits before dying pm.max_requests = 500 ; Max requests a child process will serve before respawning
Explanation of `pm` settings:
- `pm = dynamic`: The process manager will dynamically scale the number of child processes based on load. Other options are
static(fixed number of children) andondemand(spawns processes only when needed, can increase latency). - `pm.max_children`: This is the most critical setting. It defines the absolute maximum number of PHP processes that can run concurrently. Setting this too high can exhaust server memory. A common starting point is to calculate based on available RAM:
(Total RAM - RAM for OS/Nginx/DB) / Average RAM per PHP process. - `pm.start_servers`: The number of child processes to start when the pool is started.
- `pm.min_spare_servers`: The minimum number of idle processes that should be kept waiting.
- `pm.max_spare_servers`: The maximum number of idle processes. If there are more idle processes than this, they will be killed.
- `pm.process_idle_timeout`: If `pm` is `dynamic`, this is the number of seconds after which an idle process will be killed.
- `pm.max_requests`: The number of requests each child process should serve before being reaped. This helps prevent memory leaks from accumulating over time. A value between 250 and 1000 is typical.
Tuning `pm.max_children`
This is often the most challenging parameter to tune. You need to monitor your server’s memory usage under load. A common strategy is to:
- Start with a conservative value for
pm.max_children(e.g., 20-30). - Monitor memory usage (e.g., using
htop,free -m, or cloud provider metrics). - Gradually increase
pm.max_childrenwhile observing memory. If memory usage approaches critical levels (e.g., >80-90%), stop increasing. - If your application is memory-intensive, you might need to reduce
pm.max_childrenor increase server RAM. - Consider the memory footprint of your PHP application. A simple WordPress install might use 10-20MB per process, while a complex Laravel app could use 50-100MB+.
You can check the average memory usage per PHP-FPM process by observing the output of ps aux | grep php-fpm and calculating the average RSS (Resident Set Size).
# Example of checking memory usage ps aux --sort=-%mem | grep php-fpm # Sum the RSS column for php-fpm processes and divide by the number of processes # to get an average memory footprint per process.
Other PHP-FPM Directives
; /etc/php/X.Y/fpm/pool.d/www.conf ; Request termination after this amount of time request_terminate_timeout = 60s ; Set to 'no' for production to prevent accidental execution of code cgi.fix_pathinfo = 0 ; Increase memory limit if your application requires it memory_limit = 256M ; Adjust execution time limit max_execution_time = 60
PostgreSQL Tuning on Google Cloud
PostgreSQL performance is critical for any data-driven application. On Google Cloud, whether using Cloud SQL for PostgreSQL or a self-managed instance on Compute Engine, tuning is essential.
Key Configuration Parameters (`postgresql.conf`)
These parameters are typically found in postgresql.conf. For Cloud SQL, you’ll manage these via the instance’s “Flags” section in the Google Cloud Console.
# Shared Memory shared_buffers = 25% of total RAM ; e.g., 4GB for 16GB RAM instance effective_cache_size = 75% of total RAM ; Helps query planner estimate OS cache # WAL (Write-Ahead Logging) wal_buffers = 16MB ; Usually 1/4 of wal_buffers, up to 16MB wal_writer_delay = 200ms ; How often WAL writer flushes min_wal_size = 1GB ; Minimum size of WAL files max_wal_size = 4GB ; Maximum size of WAL files (controls checkpoint frequency) checkpoint_completion_target = 0.9 ; Spreads checkpoint I/O over time # Autovacuum autovacuum = on autovacuum_max_workers = 3 ; Number of concurrent autovacuum processes autovacuum_naptime = 15s ; How often to check for jobs autovacuum_vacuum_threshold = 50 ; Min number of rows to trigger vacuum autovacuum_analyze_threshold = 50 ; Min number of rows to trigger analyze # Connection and Resource Management max_connections = 100 ; Adjust based on application needs and server resources shared_preload_libraries = 'pg_stat_statements' ; Essential for query analysis log_statement = 'ddl' ; Log Data Definition Language statements log_min_duration_statement = 1000 ; Log queries longer than 1s (adjust as needed) log_lock_waits = on ; Log waits for locks log_temp_files = 0 ; Log temporary files larger than this size (0 to disable) # Query Planner random_page_cost = 1.1 ; Lower if disk I/O is fast (e.g., SSDs) seq_page_cost = 1.0 ; Default, usually fine
`pg_stat_statements` for Query Analysis
The pg_stat_statements extension is invaluable for identifying slow queries. Ensure it’s enabled in shared_preload_libraries and then create the extension in your database.
-- Connect to your database
-- psql -h your_cloud_sql_instance_ip -U your_user -d your_database
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Query for top N slowest queries by total time
SELECT
(total_exec_time / 1000 / 60) AS total_minutes,
(total_exec_time / calls) AS avg_ms,
calls,
query
FROM
pg_stat_statements
ORDER BY
total_exec_time DESC
LIMIT 10;
-- Query for top N slowest queries by average time
SELECT
(total_exec_time / calls) AS avg_ms,
calls,
query
FROM
pg_stat_statements
WHERE calls > 0
ORDER BY
avg_ms DESC
LIMIT 10;
Connection Pooling
For applications with many short-lived connections (e.g., PHP applications using PDO/mysqli), a connection pooler like PgBouncer can significantly reduce overhead. Deploying PgBouncer as a separate service (or on the same instance if resources permit) is recommended.
# /etc/pgbouncer/pgbouncer.ini [databases] mydb = host=127.0.0.1 port=5432 dbname=your_database user=your_user password=your_password [pgbouncer] listen_addr = 127.0.0.1 listen_port = 6432 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt pool_mode = session ; or transaction default_pool_size = 20 max_client_conn = 1000 default_max_client_conn = 100 ; ... other settings ...
Your application would then connect to PostgreSQL via PgBouncer’s port (e.g., 6432) instead of directly to PostgreSQL’s port (5432).
Monitoring and Iteration
Tuning is an iterative process. Continuous monitoring is key to identifying bottlenecks and validating changes. Utilize Google Cloud’s built-in monitoring tools (Cloud Monitoring, Cloud Logging) for your Compute Engine instances and Cloud SQL instances. For GKE, Prometheus and Grafana are excellent choices.
- Nginx: Monitor active connections, request rates, error rates (4xx, 5xx), and response times.
- Gunicorn/PHP-FPM: Monitor worker process counts, request queues, worker utilization, and error logs.
- PostgreSQL: Monitor CPU utilization, memory usage, disk I/O, active connections, slow queries (via
pg_stat_statements), and replication lag (if applicable).
Regularly review these metrics, especially after deploying changes or during peak traffic periods, to fine-tune your configurations further.