Scaling Magento 2 on AWS to Handle 50,000+ Concurrent Requests
Architectural Foundation: Microservices and Asynchronous Processing
Achieving a 50,000+ concurrent request baseline for a Magento 2 instance is not a matter of simply throwing more resources at a monolithic application. It necessitates a fundamental shift towards a microservices architecture and aggressive asynchronous processing. This involves decoupling core Magento functionalities into independent, scalable services and offloading non-critical or time-consuming operations to background workers.
For Magento 2, this typically means identifying and isolating components like product catalog management, order processing, inventory updates, and customer data management. Each of these can become a distinct service, communicating via lightweight protocols like gRPC or REST APIs. Crucially, any operation that doesn’t require an immediate synchronous response – such as sending order confirmation emails, updating search indexes, or generating reports – must be pushed into a robust message queue system.
AWS Infrastructure Blueprint for High Concurrency
A high-availability, horizontally scalable AWS infrastructure is paramount. We’ll leverage a multi-AZ deployment for all critical components to ensure resilience. The core components include:
- Load Balancing: AWS Application Load Balancer (ALB) for intelligent traffic distribution.
- Compute: Auto Scaling Groups (ASGs) of EC2 instances running Magento 2 application servers.
- Database: Amazon Aurora (MySQL-compatible) with read replicas.
- Caching: Amazon ElastiCache for Redis for session management, page caching, and object caching.
- Message Queue: Amazon SQS (Simple Queue Service) for asynchronous task distribution.
- Search: Amazon OpenSearch Service (formerly Elasticsearch) for product search indexing and querying.
- CDN: Amazon CloudFront for static asset delivery.
- Storage: Amazon S3 for media assets.
EC2 Auto Scaling Group Configuration for Magento App Servers
The Magento application servers are the frontline. We need an ASG configured to scale based on relevant metrics. For high concurrency, CPU utilization is a primary driver, but we also need to consider network I/O and request latency.
Key ASG Settings:
- Launch Template: Define the EC2 instance type (e.g., `m5.xlarge` or `c5.xlarge` depending on CPU vs. Memory needs), AMI (a hardened, optimized OS image with PHP-FPM, Nginx, and necessary extensions pre-installed), security groups, IAM roles, and user data scripts for bootstrapping.
- Min/Max Instances: Start with a reasonable minimum (e.g., 4-8 instances) and a high maximum (e.g., 50-100 instances) to accommodate traffic spikes.
- Scaling Policies:
- Target Tracking Scaling: Target average CPU utilization at 60-70%.
- Step Scaling: Add reactive scaling for sudden bursts. For example, if CPU exceeds 80% for 5 minutes, add 2 instances. If it drops below 40% for 15 minutes, remove 1 instance.
- Scheduled Scaling: Pre-scale for known traffic events (e.g., flash sales, holiday periods).
- Health Checks: Configure EC2 and ELB health checks to ensure unhealthy instances are terminated and replaced. A custom health check endpoint in Magento is recommended.
Nginx Configuration for High-Performance Magento
The Nginx configuration is critical for efficient request handling and serving static assets. We’ll use Nginx as a reverse proxy to PHP-FPM and for serving static files directly.
Example Nginx Configuration Snippet (within `nginx.conf` or a site-specific conf file):
worker_processes auto;
worker_connections 4096; # Adjust based on instance type and OS limits
events {
multi_accept on;
use epoll;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
server_tokens off; # Important for security
# Gzip compression
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;
# Buffers and timeouts
client_body_buffer_size 128k;
client_max_body_size 100m; # Adjust as needed for file uploads
client_header_buffer_size 4k;
large_client_header_buffers 4 128k;
output_buffers 1 32k;
post_action @fallback; # For handling large POST requests gracefully
# FastCGI (PHP-FPM) configuration
upstream php-fpm {
server unix:/var/run/php/php7.4-fpm.sock weight=1 fail_timeout=0; # Adjust PHP version and socket path
# If using TCP sockets for PHP-FPM:
# server 127.0.0.1:9000 weight=1 fail_timeout=0;
}
# Magento specific server block
server {
listen 80;
server_name your-magento-domain.com;
root /var/www/html/magento; # Adjust Magento root path
index index.php;
# Static files caching and serving
location ~ ^/(media|static)/ {
expires 30d;
access_log off;
add_header Cache-Control "public";
try_files $uri $uri/ /static.php?$args; # Magento's static file handler
}
# Deny access to sensitive files
location ~* /(composer\.json|composer\.lock|\.git|\.svn|var/|bin/|\.htaccess|LICENSE|README\.md) {
deny all;
return 404;
}
# PHP-FPM processing
location ~ \.php$ {
try_files $uri =404;
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_pass php-fpm;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
# Magento specific FastCGI parameters
fastcgi_param MAGE_RUN_CODE "your_store_code"; # Replace with your store code
fastcgi_param MAGE_RUN_TYPE "store"; # or "website" or "store_view"
fastcgi_read_timeout 300; # Increase timeout for long-running PHP scripts
}
# Health check endpoint (optional but recommended)
location /healthz {
access_log off;
return 200 'OK';
add_header Content-Type text/plain;
}
# Fallback for large POST requests (if using post_action)
location @fallback {
return 413 "Request entity too large";
}
}
# Include other server blocks or configurations
}
PHP-FPM Tuning for High Load
PHP-FPM (FastCGI Process Manager) is the engine that executes Magento’s PHP code. Its configuration directly impacts performance under load. We’ll use the `pm = dynamic` or `pm = ondemand` approach for better resource utilization.
Example `php-fpm.conf` or `www.conf` snippet:
[global] pid = /run/php/php7.4-fpm.pid error_log = /var/log/php7.4-fpm.log log_level = notice [www] user = www-data group = www-data listen = /var/run/php/php7.4-fpm.sock # Or a TCP socket like 127.0.0.1:9000 listen.owner = www-data listen.group = www-data listen.mode = 0660 ; Process Manager settings ; pm = dynamic # Recommended for most cases pm = ondemand # Can be more efficient for variable loads, but has higher initial latency ; If pm = dynamic: ; pm.max_children = 100 # Adjust based on available RAM and instance type ; pm.start_servers = 10 ; pm.min_spare_servers = 5 ; pm.max_spare_servers = 20 ; pm.max_requests = 500 # Number of requests each child process should execute before respawning ; If pm = ondemand: pm.max_children = 150 # Higher max_children for ondemand pm.process_idle_timeout = 10s pm.max_children_reached_limit = 1 # Action when max_children is reached (1=emergency, 2=alert, 3=ignore) ; Performance tuning memory_limit = 512M # Crucial for Magento. Adjust based on profiling. max_execution_time = 180 # Allow longer execution for complex tasks max_input_vars = 3000 # Magento often requires more input vars upload_max_filesize = 100M post_max_size = 100M ; Opcode Cache (highly recommended) opcache.enable=1 opcache.enable_cli=1 opcache.memory_consumption=256 # Adjust based on available RAM opcache.interned_strings_buffer=16 opcache.max_accelerated_files=10000 opcache.revalidate_freq=2 opcache.validate_timestamps=0 # Set to 1 in development, 0 in production for performance opcache.save_comments=1 opcache.load_comments=1 opcache.fast_shutdown=0 opcache.optimization_level=0xffffffff
Database Optimization: Amazon Aurora and Read Replicas
Magento is notoriously database-intensive. Amazon Aurora (MySQL-compatible) offers superior performance and availability over standard RDS. We’ll leverage read replicas to offload read-heavy operations.
Configuration Strategy:
- Instance Sizing: Start with an appropriately sized Aurora instance (e.g., `db.r6g.2xlarge` or larger, depending on load) for the primary writer.
- Read Replicas: Provision multiple read replicas (e.g., 3-5 initially) in different Availability Zones. Magento’s database connection pool needs to be configured to distribute read queries across these replicas.
- Connection Pooling: Implement a robust connection pooling solution on the application servers. Tools like `ProxySQL` or application-level pooling can manage connections efficiently, reducing overhead.
- Query Optimization: Regularly analyze slow queries using Aurora’s Performance Insights and `EXPLAIN` plans. Optimize critical Magento queries, especially those related to product listings, categories, and checkout.
- Indexing: Ensure Magento’s database indexes are correctly configured and maintained. Avoid full table scans.
- Parameter Groups: Tune Aurora’s parameter groups for optimal performance. Key parameters include `innodb_buffer_pool_size`, `innodb_log_file_size`, `max_connections`, and `query_cache_size` (though query cache is often disabled in modern MySQL/Aurora versions due to contention).
Caching Layers: ElastiCache for Redis
A multi-layered caching strategy is non-negotiable. Amazon ElastiCache for Redis will serve as our primary caching backend.
Redis Configuration and Usage:
- Cluster Mode: For high availability and scalability, deploy Redis in cluster mode.
- Instance Sizing: Choose appropriate Redis instance types (e.g., `cache.r6g.xlarge` or larger) based on memory requirements and throughput.
- Magento Configuration: Configure Magento’s cache types to use Redis. This includes:
- Full Page Cache (FPC): Essential for serving cached HTML.
- Session Storage: Offloads session management from the database.
- Object Cache: Caches frequently accessed data like configuration, product attributes, etc.
- Persistence: Configure RDB snapshots or AOF persistence as needed for data durability, though for caching layers, durability might be less critical than performance.
- Eviction Policy: Set an appropriate eviction policy (e.g., `allkeys-lru`) to manage memory usage.
Magento `app/etc/env.php` snippet for Redis:
<?php
return [
'backend' => [
'front' => [
'Mage_Cache_Backend_Redis' => [
'backend' => 'Mage_Cache_Backend_Redis',
'width' => 10,
'database' => '0',
'password' => '', // If Redis requires a password
'compress_data' => '1',
'compression_lib' => '',
'host' => 'your-redis-primary-endpoint.xxxxxx.ng.0001.use1.cache.amazonaws.com', // Primary endpoint
'port' => '6379',
'force_standalone' => '0',
'connect_retries' => '1',
'read_timeout' => '10',
'automatic_cleaning_factor' => '0',
'options' => [
'syntax_error_reporting' => 1,
'failover_strategy' => 'standard',
'compression_threshold' => '2048',
'compression_library' => 'gzip',
'redis_version' => '5', // Adjust based on your Redis version
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'compress_data' => '1',
'compression_library' => 'gzip',
'compression_threshold' => '2048',
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '10',
'cluster_password' => '',
'cluster_database' => '0',
'cluster_compress_data' => '1',
'cluster_compression_lib' => '',
'cluster_compression_threshold' => '2048',
'cluster_compression_library' => 'gzip',
'cluster_redis_version' => '5',
'cluster_persistent' => '',
'cluster_auth_sequence' => [],
'cluster_options' => [],
'redis_version' => '5',
'persistent' => '',
'auth_sequence' => [],
'sentinel_master' => '',
'sentinel_port' => '26379',
'sentinel_timeout' => '0.1',
'cluster_master_host' => '',
'cluster_master_port' => '6379',
'cluster_master_password' => '',
'cluster_timeout' => '1',
'cluster_read_timeout' => '1',
'cluster_connect_timeout' => '0.1',
'cluster_retry_connect' => '1',
'cluster_retry_delay' => '1',
'cluster_read_timeout_when_failover' => '