Server Monitoring Best Practices: Keeping Your Magento 2 App and PostgreSQL Clusters Alive on Linode
Proactive PostgreSQL Monitoring with pg_monitor and Prometheus
Maintaining the health and performance of a PostgreSQL cluster, especially one powering a high-traffic Magento 2 instance, demands more than just basic CPU and memory checks. We need deep visibility into the database’s internal state. For this, pg_monitor, a robust PostgreSQL exporter for Prometheus, is indispensable. It exposes a wealth of metrics directly from PostgreSQL, allowing us to build comprehensive dashboards and alerts.
First, ensure you have Prometheus and Grafana installed and configured on your Linode instances or a dedicated monitoring server. Then, we’ll deploy pg_monitor. The easiest way is often via Docker, which isolates dependencies and simplifies management.
Deploying pg_monitor with Docker
Create a docker-compose.yml file to manage the pg_monitor container. This configuration connects it to your PostgreSQL instances and exposes the Prometheus metrics endpoint.
version: '3.7'
services:
pg_monitor:
image: prometheuscommunity/postgres-exporter:latest
container_name: pg_monitor
restart: unless-stopped
environment:
# Replace with your actual PostgreSQL connection string(s)
# For multiple databases, separate with commas or use a config file
- DATA_SOURCE_NAME=postgresql://monitor_user:your_secure_password@your_pg_host1:5432/your_database?sslmode=disable
# Example for a second instance:
# - DATA_SOURCE_NAME=postgresql://monitor_user:your_secure_password@your_pg_host2:5432/your_database?sslmode=disable
- PG_EXPORTER_EXTEND_METRICS=pg_stat_statements,pg_stat_replication,pg_stat_activity,pg_locks,pg_stat_database,pg_stat_user_tables,pg_stat_user_indexes
- PG_EXPORTER_MAX_CLIENTS=100
- PG_EXPORTER_MAX_PREPARED_STEMENTS=100
ports:
- "9187:9187" # Expose Prometheus metrics endpoint
labels:
# For Prometheus discovery (if using file_sd_configs)
- "com.centurylinklabs.com.docker.compose.project=magento_monitoring"
- "com.centurylinklabs.com.docker.compose.service=pg_monitor"
Before starting, create a dedicated PostgreSQL user with minimal privileges for monitoring. This user should have `CONNECT` and `SELECT` permissions on relevant system catalogs and views. Avoid granting superuser or excessive privileges.
-- Connect to your PostgreSQL instance as a superuser CREATE USER monitor_user WITH PASSWORD 'your_secure_password'; GRANT CONNECT ON DATABASE your_database TO monitor_user; -- Grant SELECT on specific system views. Adjust as needed. GRANT SELECT ON pg_stat_activity TO monitor_user; GRANT SELECT ON pg_stat_replication TO monitor_user; GRANT SELECT ON pg_stat_database TO monitor_user; GRANT SELECT ON pg_locks TO monitor_user; GRANT SELECT ON pg_stat_user_tables TO monitor_user; GRANT SELECT ON pg_stat_user_indexes TO monitor_user; GRANT SELECT ON pg_stat_statements TO monitor_user; -- If pg_stat_statements is enabled
Enable the pg_stat_statements extension if you intend to monitor query performance in detail. This requires superuser privileges.
-- Connect to your PostgreSQL instance as a superuser CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- You might need to restart PostgreSQL for the extension to be fully active. -- Ensure pg_stat_statements is enabled in postgresql.conf: -- shared_preload_libraries = 'pg_stat_statements' -- pg_stat_statements.track = all -- pg_stat_statements.max = 10000 -- pg_stat_statements.track_utility = off
Now, start the container:
docker-compose up -d
Verify that the exporter is running and accessible:
curl http://localhost:9187/metrics
Finally, configure Prometheus to scrape these metrics. Add a job to your prometheus.yml configuration:
scrape_configs:
- job_name: 'postgres'
static_configs:
- targets: ['localhost:9187'] # Or the IP/hostname of your pg_monitor container
labels:
instance: 'your_pg_cluster_name' # e.g., 'magento_db_primary'
Reload your Prometheus configuration.
Key PostgreSQL Metrics for Magento 2
With pg_monitor and Prometheus in place, focus on these critical metrics:
pg_stat_activity_count: Number of active connections. High numbers can indicate connection pool exhaustion or slow queries.pg_stat_replication_state: Status of replication slots (e.g., ‘streaming’, ‘catchup’, ‘backup’). Essential for HA setups.pg_stat_database_numbackends: Total number of backends connected to a database.pg_stat_user_tables_n_tup_ins,_n_tup_upd,_n_tup_del: Row insert, update, and delete rates. Useful for understanding data churn.pg_stat_user_tables_idx_scan: Index scan rate. High sequential scans on large tables might indicate missing indexes or inefficient queries.pg_stat_statements_calls: Number of times a query has been executed.pg_stat_statements_total_time: Total time spent executing a query.pg_stat_statements_rows: Total rows returned by a query.pg_locks_count: Number of active locks. Excessive locks lead to blocking and deadlocks.pg_stat_bgwriter_buffers_backend,_buffers_backend_fsync,_buffers_clean,_buffers_backend: Background writer activity. Helps diagnose I/O bottlenecks.
Magento 2 Application Monitoring with Blackfire.io
While PostgreSQL metrics are crucial, understanding the performance bottlenecks within the Magento 2 application itself is equally vital. Blackfire.io is a powerful PHP profiler that provides deep insights into code execution, memory usage, and I/O operations specific to your application.
Installing and Configuring Blackfire
Blackfire typically involves two components: the PHP extension and the companion agent. For Linode deployments, especially those using Docker or managed PHP-FPM, the installation varies slightly.
For PHP-FPM (non-Docker):
# Install the Blackfire PHP extension pecl install blackfire # Add the extension to your php.ini (or a dedicated conf.d file) echo "extension=blackfire.so" >> /etc/php/7.4/fpm/conf.d/50-blackfire.ini # Adjust PHP version as needed echo "extension=blackfire.so" >> /etc/php/7.4/cli/conf.d/50-blackfire.ini # Restart PHP-FPM systemctl restart php7.4-fpm # Adjust PHP version as needed # Install the Blackfire agent (if not using Blackfire.io's cloud agent) # This is less common for cloud deployments unless you have specific network needs # Refer to Blackfire docs for agent installation if required.
You’ll need to obtain your Blackfire credentials (server ID and token) from your Blackfire.io account and configure them:
# Run this command on your web server blackfire-cli config --server-id=YOUR_SERVER_ID --server-token=YOUR_SERVER_TOKEN --endpoint=https://blackfire.io
For Docker (PHP-FPM or CLI):
Modify your Dockerfile to include the Blackfire extension. You might need to build a custom image.
# Example Dockerfile snippet
FROM php:7.4-fpm
# Install dependencies for Blackfire
RUN apt-get update && apt-get install -y \
libcurl4-openssl-dev \
libssl-dev \
git \
unzip \
&& rm -rf /var/lib/apt/lists/*
# Install Blackfire PECL extension
RUN pecl install blackfire && docker-php-ext-enable blackfire
# Copy Blackfire configuration (optional, can be done via blackfire-cli later)
# COPY blackfire.ini /usr/local/etc/php/conf.d/50-blackfire.ini
# ... rest of your Dockerfile
After building and running the container, execute the blackfire-cli config command within the container or ensure the configuration is mounted correctly.
Profiling Magento 2 Requests
Once Blackfire is installed, you can trigger profiles. The easiest way is via the Blackfire browser extension. When enabled, it injects a header (X-Blackfire-Profile: 1) into requests. You can also trigger profiles manually via cURL or CLI.
# Profile a specific Magento 2 page via CLI
blackfire-cli run -- php /path/to/your/magento/bin/magento module:status
# Profile a web request via cURL
curl -o /dev/null -s -w "%{http_code}\n" \
-H "X-Blackfire-Profile: 1" \
https://your-magento-domain.com/some/product/page
The output will include a link to the Blackfire.io dashboard where you can analyze the profile. Look for:
- Longest-running functions: Identify specific PHP methods consuming the most time.
- Memory usage: Detect memory leaks or excessive consumption.
- I/O operations: Analyze file reads/writes and network calls.
- Database calls: While Blackfire shows *when* DB calls happen, correlating this with PostgreSQL metrics is key.
Linode Infrastructure Monitoring with Node Exporter and Alertmanager
Beyond the application and database layers, robust infrastructure monitoring on Linode is foundational. We’ll leverage the standard Prometheus Node Exporter for system-level metrics and Alertmanager for sophisticated alerting.
Deploying Node Exporter
Node Exporter is typically installed directly on each Linode instance you wish to monitor.
# Download the latest Node Exporter binary wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz # Check for latest version tar xvfz node_exporter-1.3.1.linux-amd64.tar.gz cd node_exporter-1.3.1.linux-amd64 # Move the binary to /usr/local/bin sudo mv node_exporter /usr/local/bin/ # Create a systemd service file for Node Exporter sudo tee /etc/systemd/system/node_exporter.service <<EOF [Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=nobody Group=nobody Type=simple ExecStart=/usr/local/bin/node_exporter \ --collector.textfile.directory=/var/lib/node_exporter/textfile-collector [Install] WantedBy=multi-user.target EOF # Create the textfile collector directory sudo mkdir -p /var/lib/node_exporter/textfile-collector # Enable and start the Node Exporter service sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter # Verify Node Exporter is running and accessible sudo systemctl status node_exporter curl http://localhost:9100/metrics
Add a scrape configuration to your Prometheus server’s prometheus.yml:
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['linode1_ip:9100', 'linode2_ip:9100', 'linode3_ip:9100'] # Replace with your Linode IPs
labels:
env: 'production'
role: 'webserver' # Or 'dbserver' etc.
Reload Prometheus.
Configuring Alertmanager
Alertmanager handles alerts sent by Prometheus, deduplicating, grouping, and routing them to the correct receivers (e.g., Slack, PagerDuty, email). A basic alertmanager.yml might look like this:
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver' # Default receiver if no specific match
routes:
- receiver: 'critical-alerts'
match:
severity: 'critical'
continue: true # Allows matching other routes if needed
receivers:
- name: 'default-receiver'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL'
channel: '#alerts-general'
send_resolved: true
- name: 'critical-alerts'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL'
channel: '#alerts-critical'
send_resolved: true
pagerduty_configs:
- service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
Ensure Prometheus is configured to send alerts to Alertmanager:
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager_host:9093'] # IP/hostname of your Alertmanager instance
Essential Alerting Rules for Magento 2 on Linode
Define alerting rules in Prometheus (e.g., in a separate rules.yml file referenced by prometheus.yml). Here are some critical rules:
groups:
- name: MagentoAlerts
rules:
# PostgreSQL Alerts
- alert: HighPostgresConnections
expr: pg_stat_activity_count > 200 # Adjust threshold based on your pool size
for: 5m
labels:
severity: warning
service: postgresql
annotations:
summary: "High number of PostgreSQL connections on {{ $labels.instance }}"
description: "PostgreSQL on {{ $labels.instance }} has {{ $value }} active connections, exceeding the threshold."
- alert: PostgresReplicationLagging
expr: pg_stat_replication_state != 'streaming' AND pg_stat_replication_state != 'catchup' # Check specific state for your setup
for: 10m
labels:
severity: critical
service: postgresql
annotations:
summary: "PostgreSQL replication lag on {{ $labels.instance }}"
description: "Replication slot on {{ $labels.instance }} is not in 'streaming' or 'catchup' state. Current state: {{ $value }}."
- alert: HighPostgresLockWait
expr: pg_locks_count > 50 # Adjust threshold
for: 5m
labels:
severity: warning
service: postgresql
annotations:
summary: "High number of PostgreSQL locks on {{ $labels.instance }}"
description: "PostgreSQL on {{ $labels.instance }} has {{ $value }} active locks, potentially causing blocking."
# Magento Application Alerts (example using Blackfire data if exposed, or custom exporters)
# This requires custom exporters or specific Blackfire integrations not covered here.
# Placeholder for application-specific alerts:
- alert: MagentoFrontendErrorRate
expr: job:magento_frontend_requests:rate5m > 5 # Assuming a metric 'magento_frontend_requests' is exposed
for: 5m
labels:
severity: warning
service: magento_frontend
annotations:
summary: "High error rate on Magento frontend"
description: "Magento frontend on {{ $labels.instance }} is experiencing a high rate of errors."
# Linode Infrastructure Alerts
- alert: InstanceHighCpuUsage
expr: 100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 > 90
for: 10m
labels:
severity: warning
service: infrastructure
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "Instance {{ $labels.instance }} CPU usage is {{ $value | printf \"%.2f\" }}%."
- alert: InstanceLowDiskSpace
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
for: 15m
labels:
severity: critical
service: infrastructure
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Instance {{ $labels.instance }} has {{ $value | printf \"%.2f\" }}% disk space remaining on /."
- alert: InstanceHighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
for: 5m
labels:
severity: warning
service: infrastructure
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Instance {{ $labels.instance }} is using {{ $value | printf \"%.2f\" }}% of available memory."
Reload Prometheus and Alertmanager configurations. This layered approach—deep PostgreSQL insights, granular application profiling, and comprehensive infrastructure monitoring—provides the necessary visibility to keep your Magento 2 cluster stable and performant on Linode.