Server Monitoring Best Practices: Keeping Your Shopify App and PostgreSQL Clusters Alive on OVH
Proactive PostgreSQL Monitoring with Prometheus and Grafana on OVH
Maintaining the health and performance of PostgreSQL clusters, especially those backing critical Shopify applications, demands a robust monitoring strategy. On OVH infrastructure, this often involves self-managed PostgreSQL instances. We’ll focus on a Prometheus-based stack, leveraging exporters and Grafana for visualization and alerting.
Deploying the PostgreSQL Exporter
The postgres_exporter is essential for exposing PostgreSQL metrics to Prometheus. It requires a dedicated PostgreSQL user with specific privileges. Ensure this user has read-only access to relevant system catalogs and statistics views.
First, create a monitoring user in your PostgreSQL cluster:
-- Connect to your PostgreSQL instance as a superuser CREATE USER monitor WITH PASSWORD 'your_strong_password'; GRANT pg_read_all_stats TO monitor; GRANT pg_stat_statements_user TO monitor; -- For specific database monitoring, grant access to that database GRANT CONNECT ON DATABASE your_app_db TO monitor; GRANT USAGE ON SCHEMA pg_catalog TO monitor; GRANT SELECT ON pg_stat_activity TO monitor; GRANT SELECT ON pg_stat_database TO monitor; GRANT SELECT ON pg_stat_replication TO monitor; GRANT SELECT ON pg_stat_statements TO monitor; GRANT SELECT ON pg_locks TO monitor; GRANT SELECT ON pg_settings TO monitor; GRANT SELECT ON pg_stat_user_tables TO monitor; GRANT SELECT ON pg_stat_user_indexes TO monitor;
Next, install and configure the postgres_exporter. This can be done via Docker or directly on a host. For this example, we’ll assume a Docker deployment on a dedicated monitoring host or within your OVH cloud environment.
Create a .pgpass file for the user running the exporter to avoid embedding credentials directly in configuration:
# ~/.pgpass hostname:port:database:username:password your_pg_host:5432:*:monitor:your_strong_password
Set appropriate permissions for the .pgpass file:
chmod 0600 ~/.pgpass
Run the exporter using Docker:
docker run -d \ --name postgres_exporter \ -p 9187:9187 \ -e DATA_SOURCE_NAME="postgresql://monitor:your_strong_password@your_pg_host:5432/postgres?sslmode=disable" \ quay.io/prometheus_community/postgres-exporter:latest
Note: Replace your_pg_host with the actual hostname or IP of your PostgreSQL cluster. Adjust sslmode as per your PostgreSQL configuration. For production, using sslmode=verify-full with proper certificates is highly recommended.
Configuring Prometheus to Scrape PostgreSQL Metrics
Edit your Prometheus configuration file (typically prometheus.yml) to include a scrape job for the PostgreSQL exporter.
scrape_configs:
- job_name: 'postgres'
static_configs:
- targets: ['your_exporter_host:9187'] # Replace with your exporter's host and port
metrics_path: /metrics
params:
collect[]:
- pg_stat_activity
- pg_stat_database
- pg_stat_replication
- pg_stat_statements
- pg_locks
- pg_settings
- pg_stat_user_tables
- pg_stat_user_indexes
- pg_up
- pg_postmaster_start_time
- pg_database_size
- pg_replication_lag
After updating the configuration, reload or restart your Prometheus server:
# If running Prometheus as a systemd service sudo systemctl reload prometheus # Or restart if needed sudo systemctl restart prometheus
Verify that Prometheus is scraping the PostgreSQL exporter by navigating to its UI (usually http://your_prometheus_host:9090/targets) and checking the status of the ‘postgres’ job.
Key PostgreSQL Metrics for Shopify Applications
When monitoring PostgreSQL for a Shopify app, prioritize metrics that indicate performance bottlenecks, resource contention, and potential failures. Here are some critical ones:
pg_stat_activity_count: Number of active connections. High numbers can indicate connection pool exhaustion or slow queries.pg_stat_database_numbackends: Total number of backends connected to a database.pg_stat_replication_lag_seconds: Replication lag for standby servers. Crucial for high availability and disaster recovery.pg_stat_statements_calls: Number of times a statement has been executed. Helps identify frequently run queries.pg_stat_statements_total_time_seconds: Total time spent executing a statement. Highlights performance-critical queries.pg_locks_count: Number of active locks. High lock counts can lead to deadlocks and query slowdowns.pg_database_size_bytes: Size of databases. Important for capacity planning.pg_up: Indicates if the PostgreSQL instance is reachable.
Setting Up Grafana Dashboards and Alerts
Grafana provides a powerful interface for visualizing PostgreSQL metrics and setting up alerts. You can import pre-built dashboards or create custom ones.
Importing a Dashboard:
Grafana’s dashboard repository (grafana.com/grafana/dashboards/) has excellent PostgreSQL dashboards. Search for “PostgreSQL” and import a highly-rated one (e.g., ID 7362 or 12000). Ensure your Prometheus data source is configured in Grafana.
Creating Custom Dashboards:
For specific needs, create a new dashboard and add panels. For example, to visualize active connections:
Query:
sum(pg_stat_activity_count{job="postgres"}) by (datname)
Visualization:
Graph or Stat
Title:
Active Connections per Database
Alerting Rules:
Define alerting rules in Prometheus (via a separate alert.rules.yml file, which is then included in prometheus.yml) or directly within Grafana. Here’s an example of a Prometheus alert rule for high replication lag:
groups:
- name: postgresql.rules
rules:
- alert: PostgreSQLReplicationLagging
expr: pg_replication_lag_seconds{job="postgres"} > 60 # Alert if lag is over 60 seconds
for: 5m
labels:
severity: critical
annotations:
summary: "PostgreSQL replication lag detected on {{ $labels.instance }}"
description: "Replication lag on {{ $labels.instance }} is {{ $value }} seconds, exceeding the threshold."
Ensure your Prometheus server is configured to send alerts to Alertmanager, which then routes them to your preferred notification channels (Slack, PagerDuty, email, etc.).
Monitoring Shopify App Performance with Prometheus Node Exporter
Beyond the database, your Shopify application servers themselves require monitoring. The Prometheus node_exporter is the standard for collecting hardware and OS metrics.
Install node_exporter on each of your application servers. This can be done by downloading the binary or using a package manager.
# Example for Debian/Ubuntu wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/ sudo rm -rf node_exporter-1.7.0.linux-amd64*
Create a systemd service file for node_exporter:
# /etc/systemd/system/node_exporter.service [Unit] Description=Prometheus Node Exporter Wants=network-online.target After=network-online.target [Service] User=prometheus ExecStart=/usr/local/bin/node_exporter \ --collector.cpu \ --collector.diskstats \ --collector.filesystem \ --collector.meminfo \ --collector.netdev \ --collector.stat \ --collector.time \ --collector.loadavg \ --collector.textfile \ --collector.vmstat Restart=on-failure [Install] WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter
Configure Prometheus to scrape these instances:
scrape_configs:
- job_name: 'node'
static_configs:
- targets:
- 'app_server_1_ip:9100' # Replace with your app server IPs
- 'app_server_2_ip:9100'
# ... add all your app servers
metrics_path: /metrics
Application-Specific Metrics for Shopify Apps
For a Shopify application, you’ll likely need to instrument your code to expose custom metrics. This could include:
- API request latency (broken down by endpoint).
- Number of Shopify API calls (and their success/failure rates).
- Background job queue lengths and processing times.
- Cache hit/miss ratios.
- Error rates (e.g., exceptions caught).
You can use Prometheus client libraries for your application’s language (e.g., Python, Ruby, PHP) to expose these metrics via an HTTP endpoint (typically /metrics) on each application server. This endpoint will then be scraped by Prometheus.
Example (Python with Flask and Prometheus client):
from flask import Flask, Response
from prometheus_client import generate_latest, Counter, Histogram, Gauge
import time
import random
app = Flask(__name__)
# Define custom metrics
shopify_api_calls = Counter('shopify_api_calls_total', 'Total number of Shopify API calls', ['endpoint', 'method', 'status'])
request_latency = Histogram('shopify_app_request_latency_seconds', 'Shopify app request latency', buckets=[.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10, float('inf')])
background_job_queue_size = Gauge('shopify_background_job_queue_size', 'Current size of the background job queue')
@app.route('/')
def index():
# Simulate some work
with request_latency.time():
time.sleep(random.uniform(0.01, 0.5))
# Simulate a Shopify API call
try:
# Replace with actual Shopify API call logic
status_code = 200
shopify_api_calls.labels(endpoint='/admin/api/2023-10/products.json', method='GET', status=status_code).inc()
except Exception as e:
status_code = 500
shopify_api_calls.labels(endpoint='/admin/api/2023-10/products.json', method='GET', status=status_code).inc()
# Log the error
print(f"Error calling Shopify API: {e}")
# Simulate background job queue update
background_job_queue_size.set(random.randint(0, 100))
return "Hello, Shopify App!"
@app.route('/metrics')
def metrics():
return Response(generate_latest(), mimetype='text/plain')
if __name__ == '__main__':
# Run on a different port if your app runs on 80/443
app.run(host='0.0.0.0', port=5001)
Add a scrape job for these application metrics in your prometheus.yml:
scrape_configs:
- job_name: 'shopify_app'
static_configs:
- targets:
- 'app_server_1_ip:5001' # Port your app metrics are exposed on
- 'app_server_2_ip:5001'
metrics_path: /metrics
OVH Specific Considerations
When operating on OVH, keep these points in mind:
- Networking: Ensure your Prometheus server can reach your PostgreSQL instances and application servers. This might involve configuring OVH Security Groups or Firewall rules to allow traffic on specific ports (e.g., 5432 for PostgreSQL, 9187 for postgres_exporter, 9100 for node_exporter, your app’s metrics port).
- Instance Types: Choose appropriate OVH instance types for your PostgreSQL clusters and application servers based on CPU, RAM, and I/O requirements. Monitoring helps validate these choices.
- Managed Databases: If you opt for OVH’s managed PostgreSQL services, the monitoring approach might differ. You’ll need to check what metrics are exposed by OVH’s managed service and if they integrate with Prometheus or require a different tool. Often, you can still deploy exporters within your application’s network space to monitor the managed endpoint.
- High Availability: For PostgreSQL, implement streaming replication and monitor replication lag closely. Ensure your monitoring setup can detect failover events and alert on them.
- Cost: Be mindful of data transfer costs between OVH regions or out to the internet if your monitoring infrastructure is external.
Conclusion
A comprehensive monitoring strategy using Prometheus and Grafana is crucial for keeping your Shopify application and its PostgreSQL backend healthy and performant on OVH. By focusing on key database and system metrics, and instrumenting your application for custom insights, you can proactively identify and resolve issues before they impact your users. Regularly review your dashboards and alert thresholds to adapt to your application’s evolving needs.