Server Monitoring Best Practices: Keeping Your Shopify App and MySQL Clusters Alive on Linode
Establishing a Robust Monitoring Foundation with Prometheus and Grafana
For any production environment, especially one hosting critical applications like a Shopify app and its associated MySQL clusters on Linode, a proactive and comprehensive monitoring strategy is non-negotiable. We’ll leverage Prometheus for metrics collection and alerting, and Grafana for visualization. This setup provides deep insights into system health, performance bottlenecks, and potential failure points before they impact end-users.
Deploying Prometheus on Linode
A common and effective way to deploy Prometheus is using Docker. This ensures isolation, easy updates, and consistent environments. We’ll start by creating a `docker-compose.yml` file to define our Prometheus service.
Prometheus Configuration (`prometheus.yml`)
The core of Prometheus is its configuration file, which dictates what targets to scrape and how to evaluate alerting rules. For our Shopify app and MySQL clusters, we’ll need to configure scrape jobs for:
- The Shopify app’s web server (e.g., Nginx or Apache).
- The application’s own metrics endpoint (if it exposes any, e.g., via a custom exporter or a library like
php-prometheus-client). - MySQL exporter for each MySQL instance in the cluster.
Here’s a sample `prometheus.yml` configuration. Note that the actual scrape targets will depend on your specific setup (e.g., IP addresses, ports, service discovery mechanisms).
Example `prometheus.yml`
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s # By default, evaluate rules every 15 seconds.
scrape_configs:
# Scrape Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Scrape the Shopify App's web server (assuming Nginx on port 80)
# This assumes you have nginx-exporter running or Prometheus can scrape Nginx directly if enabled.
# For direct Nginx scraping, you'd need to enable the status module.
- job_name: 'shopify_app_nginx'
static_configs:
- targets: ['APP_SERVER_IP:9113'] # Assuming nginx-exporter is running on port 9113
# Scrape the Shopify App's custom metrics endpoint (if available)
# Example: If your PHP app exposes metrics at /metrics on port 8000
- job_name: 'shopify_app_metrics'
static_configs:
- targets: ['APP_SERVER_IP:8000']
# Scrape MySQL exporter for the primary MySQL instance
- job_name: 'mysql_primary'
static_configs:
- targets: ['MYSQL_PRIMARY_IP:9104'] # Assuming mysqld_exporter on port 9104
# Scrape MySQL exporter for a replica MySQL instance
- job_name: 'mysql_replica_1'
static_configs:
- targets: ['MYSQL_REPLICA_1_IP:9104']
# Add more jobs for other MySQL instances as needed
# - job_name: 'mysql_replica_2'
# static_configs:
# - targets: ['MYSQL_REPLICA_2_IP:9104']
# Alerting rules can be defined here or in separate files
# alerting:
# alertmanagers:
# - static_configs:
# - targets: ['alertmanager:9093']
Replace APP_SERVER_IP, MYSQL_PRIMARY_IP, and MYSQL_REPLICA_1_IP with the actual IP addresses or hostnames of your Linode instances. The port 9113 is common for nginx-exporter, and 9104 for mysqld_exporter.
Docker Compose for Prometheus
version: '3.7'
services:
prometheus:
image: prom/prometheus:v2.40.0 # Use a specific, stable version
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus:/etc/prometheus/ # Mount your prometheus.yml and any rule files
- prometheus_data:/prometheus # Persistent data volume
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/console_templates'
restart: unless-stopped
volumes:
prometheus_data:
To run this, save the content as docker-compose.yml in a directory, place your prometheus.yml in a subdirectory named prometheus, and then execute:
mkdir prometheus # Paste your prometheus.yml content into ./prometheus/prometheus.yml docker-compose up -d
Exposing MySQL Metrics with mysqld_exporter
mysqld_exporter is the standard Prometheus exporter for MySQL. It queries the MySQL server for metrics and exposes them via an HTTP endpoint. You’ll need to run this exporter on each Linode instance hosting a MySQL server.
Setting up mysqld_exporter
First, create a dedicated MySQL user for the exporter with minimal necessary privileges. This user should have PROCESS, REPLICATION CLIENT, and SELECT privileges.
-- Connect to your MySQL server as root or a privileged user CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'your_strong_password'; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost'; FLUSH PRIVILEGES;
Next, create a .my.cnf file in the home directory of the user that will run mysqld_exporter (e.g., /home/mysql_exporter/.my.cnf or ~/.my.cnf if running as a regular user). This file should contain the credentials for the exporter user.
[client] user=exporter password=your_strong_password host=localhost
Now, you can run mysqld_exporter. The most straightforward way is again using Docker.
docker run -d \ --name mysqld_exporter \ --network="host" \ -v /path/to/your/.my.cnf:/etc/mysqld_exporter/.my.cnf \ prom/mysqld-exporter:v0.15.0 \ --config.my-cnf="/etc/mysqld_exporter/.my.cnf" \ --collect.global_status \ --collect.info_schema.tables \ --collect.info_schema.table_schema \ --collect.slave_status \ --collect.binlog_size \ --collect.info_schema.processlist \ --collect.auto_increment.columns \ --collect.info_schema.innodb_metrics \ --collect.binlog \ --collect.perf_schema.tablelocks \ --collect.perf_schema.file_events \ --collect.perf_schema.eventsstatements \ --collect.perf_schema.eventsstatements.limit=10 \ --collect.perf_schema.eventswaits \ --collect.perf_schema.eventswaits.limit=10 \ --collect.slave_master_info \ --web.listen-address="0.0.0.0:9104"
Important Notes:
- Replace
/path/to/your/.my.cnfwith the actual path to your credentials file. --network="host"is used for simplicity to allow the exporter to directly access the MySQL server on the same host. If running in a different network, adjust accordingly.- The
--collect.*flags enable specific metric groups. Choose those relevant to your monitoring needs. For a production MySQL cluster, collecting status, schema, replication, and performance schema metrics is highly recommended. - Ensure the port
9104is accessible by your Prometheus server.
Visualizing Metrics with Grafana
Grafana is the de facto standard for visualizing time-series data from Prometheus. We’ll set up Grafana and connect it to our Prometheus data source.
Deploying Grafana with Docker
version: '3.7'
services:
grafana:
image: grafana/grafana:10.0.0 # Use a specific, stable version
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
restart: unless-stopped
volumes:
grafana_data:
Save this as docker-compose.yml (or add it to your existing one) and run:
docker-compose up -d grafana
Access Grafana at http://YOUR_LINODE_IP:3000. The default credentials are admin/admin. You’ll be prompted to change the password on first login.
Configuring Prometheus Data Source in Grafana
1. In Grafana, navigate to Configuration (gear icon) > Data Sources.
2. Click Add data source.
3. Select Prometheus.
4. In the URL field, enter the address of your Prometheus server. If Grafana and Prometheus are on the same Linode and using Docker Compose, this would typically be http://prometheus:9090 (if using the service name) or http://YOUR_LINODE_IP:9090 (if using host networking).
5. Click Save & Test. You should see a “Data source is working” message.
Importing Pre-built Dashboards
Grafana has a rich community providing pre-built dashboards. For MySQL and general system metrics, these are invaluable.
- MySQL Dashboard: Search for “MySQL Prometheus” on Grafana.com/dashboards. A popular one is “MySQL Overview” (Dashboard ID: 7362 or similar).
- Node Exporter Dashboard: For general system metrics (CPU, RAM, Disk, Network) on your Linode instances, use a Node Exporter dashboard (e.g., ID: 1860). You’ll need to run
node_exporteron each Linode instance. - Nginx Dashboard: If using
nginx-exporter, find a suitable Nginx dashboard (e.g., ID: 1282).
To import a dashboard:
- Go to Dashboards (four squares icon) > Import.
- Paste the Dashboard ID or upload the JSON file.
- Select your Prometheus data source.
- Click Import.
Key Metrics to Monitor for Shopify Apps and MySQL
Beyond general system health, focus on metrics critical to your application’s performance and stability.
Shopify App Metrics
- Request Latency: Track the time taken to respond to API requests from Shopify and to Shopify.
- Error Rates: Monitor HTTP 5xx and 4xx errors.
- Throughput: Requests per second.
- Queue Lengths: If your app uses background job queues (e.g., Redis queues), monitor queue sizes.
- Resource Utilization: CPU, memory, and network I/O of your application servers.
MySQL Cluster Metrics
- Replication Lag: Crucial for read replicas. Use
Seconds_Behind_MasterfromSHOW SLAVE STATUS(exposed bymysqld_exporterasmysql_slave_status_seconds_behind_master). - Connection Usage:
mysql_global_status_threads_connected,mysql_global_status_threads_running. High connected threads can indicate connection leaks or insufficient pooling. - Query Performance:
mysql_global_status_questions(total queries),mysql_global_status_slow_queries. Use performance schema metrics for detailed query analysis if available. - InnoDB Metrics:
mysql_innodb_buffer_pool_readsvsmysql_innodb_buffer_pool_read_requests(buffer pool hit rate),mysql_innodb_row_lock_waits(lock contention). - Disk I/O:
mysql_global_status_innodb_data_reads,mysql_global_status_innodb_data_writes, and corresponding latency metrics if exposed. - Temporary Tables:
mysql_global_status_created_tmp_tables,mysql_global_status_created_tmp_disk_tables. A high number of disk-based temporary tables indicates inefficient queries or insufficient memory. - Replication Errors: Monitor
mysql_slave_status_last_sql_errorandmysql_slave_status_last_io_error.
Alerting with Prometheus Alertmanager
Proactive alerting is as important as monitoring. Prometheus Alertmanager handles alerts sent by Prometheus, deduplicates them, groups them, and routes them to the correct receiver (e.g., Slack, PagerDuty, email).
Configuring Alertmanager
Add an Alertmanager service to your `docker-compose.yml` and create a configuration file (`alertmanager.yml`).
# In your main docker-compose.yml
services:
# ... prometheus and grafana services ...
alertmanager:
image: prom/alertmanager:v0.25.0 # Use a specific, stable version
container_name: alertmanager
ports:
- "9093:9093"
volumes:
- ./alertmanager:/etc/alertmanager/ # Mount your alertmanager.yml
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
restart: unless-stopped
# ./alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'job']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'slack-notifications' # Default receiver
routes:
- receiver: 'slack-notifications'
matchers:
- severity = "critical"
continue: true # Allow further routing if needed
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL'
channel: '#your-alerts-channel'
send_resolved: true
title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}'
text: '{{ range .Alerts }}*Alert:* {{ .Annotations.summary }}\n*Description:* {{ .Annotations.description }}\n*Details:* {{ range .Labels.SortedPairs }} {{ .Name }}={{ .Value }}{{ end }}\n{{ end }}'
Update your prometheus.yml to include the Alertmanager configuration:
# In your prometheus.yml
# ... other scrape configs ...
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093'] # Or YOUR_LINODE_IP:9093 if not using service discovery
Example Prometheus Alerting Rules
Create a file (e.g., rules.yml) in your ./prometheus directory and reference it in prometheus.yml.
# In your prometheus.yml, add to the root level:
rule_files:
- "/etc/prometheus/rules.yml" # Path inside the container
# ./prometheus/rules.yml
groups:
- name: mysql_alerts
rules:
- alert: MySQLReplicationLagging
expr: mysql_slave_status_seconds_behind_master > 60 # Lagging by more than 60 seconds
for: 5m
labels:
severity: critical
annotations:
summary: "MySQL replication lag detected on {{ $labels.instance }}"
description: "MySQL replica {{ $labels.instance }} is lagging behind the master by more than 60 seconds."
- alert: HighMySQLConnections
expr: mysql_global_status_threads_connected > 500 # Example threshold
for: 10m
labels:
severity: warning
annotations:
summary: "High number of MySQL connections on {{ $labels.instance }}"
description: "MySQL instance {{ $labels.instance }} has {{ $value }} connected threads, exceeding the threshold."
- name: app_alerts
rules:
- alert: HighAppErrorRate
expr: sum(rate(http_requests_total{job="shopify_app_metrics", code=~"5.."} [5m])) / sum(rate(http_requests_total{job="shopify_app_metrics"}[5m])) * 100 > 5 # More than 5% 5xx errors
for: 5m
labels:
severity: critical
annotations:
summary: "High HTTP 5xx error rate for Shopify App"
description: "Shopify App is experiencing a high rate of server errors ({{ $value | printf "%.2f" }}%)."
- alert: AppServerHighCPU
expr: avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) < 0.2 # Less than 20% idle CPU
for: 15m
labels:
severity: warning
annotations:
summary: "High CPU utilization on Shopify App server {{ $labels.instance }}"
description: "App server {{ $labels.instance }} has high CPU usage ({{ (100 - $value * 100) | printf "%.2f" }}%)."
Ensure your Prometheus container is restarted after updating its configuration to load the new rules and Alertmanager target.
Advanced Considerations and Best Practices
Service Discovery: For dynamic environments, hardcoding IPs in `prometheus.yml` is brittle. Explore Prometheus’s service discovery mechanisms (e.g., Consul, Kubernetes SD, EC2 SD) if your Linode infrastructure is managed programmatically or uses orchestration.
High Availability: For critical monitoring, consider running multiple Prometheus instances (federation or remote write to a central store like Thanos or Cortex) and multiple Alertmanager instances in a cluster.
Security: Secure your Prometheus, Grafana, and Alertmanager endpoints. Use firewalls to restrict access to only necessary IPs. Consider authentication for Grafana and Prometheus if exposed publicly.
Data Retention: Configure Prometheus’s data retention policies (`–storage.tsdb.retention.time`) to balance storage costs with the need for historical data.
Application-Specific Metrics: Instrument your Shopify app code to expose business-level metrics (e.g., number of orders processed, failed API calls to Shopify, cache hit rates) using libraries like php-prometheus-client. This provides invaluable insight into the application’s functional health.
Log Aggregation: While metrics tell you *what* is happening, logs tell you *why*. Integrate a log aggregation system (e.g., ELK stack, Loki) for comprehensive observability.
By implementing this Prometheus, Grafana, and Alertmanager stack, you establish a robust, scalable, and proactive monitoring system essential for maintaining the health and performance of your Shopify application and MySQL clusters on Linode.