Server Monitoring Best Practices: Keeping Your Shopify App and MySQL Clusters Alive on Linode

Establishing a Robust Monitoring Foundation with Prometheus and Grafana

For any production environment, especially one hosting critical applications like a Shopify app and its associated MySQL clusters on Linode, a proactive and comprehensive monitoring strategy is non-negotiable. We’ll leverage Prometheus for metrics collection and alerting, and Grafana for visualization. This setup provides deep insights into system health, performance bottlenecks, and potential failure points before they impact end-users.

Deploying Prometheus on Linode

A common and effective way to deploy Prometheus is using Docker. This ensures isolation, easy updates, and consistent environments. We’ll start by creating a `docker-compose.yml` file to define our Prometheus service.

Prometheus Configuration (`prometheus.yml`)

The core of Prometheus is its configuration file, which dictates what targets to scrape and how to evaluate alerting rules. For our Shopify app and MySQL clusters, we’ll need to configure scrape jobs for:

The Shopify app’s web server (e.g., Nginx or Apache).
The application’s own metrics endpoint (if it exposes any, e.g., via a custom exporter or a library like php-prometheus-client).
MySQL exporter for each MySQL instance in the cluster.

Here’s a sample `prometheus.yml` configuration. Note that the actual scrape targets will depend on your specific setup (e.g., IP addresses, ports, service discovery mechanisms).

Example `prometheus.yml`

global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, evaluate rules every 15 seconds.

scrape_configs:
  # Scrape Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Scrape the Shopify App's web server (assuming Nginx on port 80)
  # This assumes you have nginx-exporter running or Prometheus can scrape Nginx directly if enabled.
  # For direct Nginx scraping, you'd need to enable the status module.
  - job_name: 'shopify_app_nginx'
    static_configs:
      - targets: ['APP_SERVER_IP:9113'] # Assuming nginx-exporter is running on port 9113

  # Scrape the Shopify App's custom metrics endpoint (if available)
  # Example: If your PHP app exposes metrics at /metrics on port 8000
  - job_name: 'shopify_app_metrics'
    static_configs:
      - targets: ['APP_SERVER_IP:8000']

  # Scrape MySQL exporter for the primary MySQL instance
  - job_name: 'mysql_primary'
    static_configs:
      - targets: ['MYSQL_PRIMARY_IP:9104'] # Assuming mysqld_exporter on port 9104

  # Scrape MySQL exporter for a replica MySQL instance
  - job_name: 'mysql_replica_1'
    static_configs:
      - targets: ['MYSQL_REPLICA_1_IP:9104']

  # Add more jobs for other MySQL instances as needed
  # - job_name: 'mysql_replica_2'
  #   static_configs:
  #     - targets: ['MYSQL_REPLICA_2_IP:9104']

# Alerting rules can be defined here or in separate files
# alerting:
#   alertmanagers:
#     - static_configs:
#         - targets: ['alertmanager:9093']

Replace APP_SERVER_IP, MYSQL_PRIMARY_IP, and MYSQL_REPLICA_1_IP with the actual IP addresses or hostnames of your Linode instances. The port 9113 is common for nginx-exporter, and 9104 for mysqld_exporter.

Docker Compose for Prometheus

version: '3.7'

services:
  prometheus:
    image: prom/prometheus:v2.40.0 # Use a specific, stable version
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus:/etc/prometheus/ # Mount your prometheus.yml and any rule files
      - prometheus_data:/prometheus # Persistent data volume
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/console_templates'
    restart: unless-stopped

volumes:
  prometheus_data:

To run this, save the content as docker-compose.yml in a directory, place your prometheus.yml in a subdirectory named prometheus, and then execute:

mkdir prometheus
# Paste your prometheus.yml content into ./prometheus/prometheus.yml
docker-compose up -d

Exposing MySQL Metrics with `mysqld_exporter`

mysqld_exporter is the standard Prometheus exporter for MySQL. It queries the MySQL server for metrics and exposes them via an HTTP endpoint. You’ll need to run this exporter on each Linode instance hosting a MySQL server.

Setting up `mysqld_exporter`

First, create a dedicated MySQL user for the exporter with minimal necessary privileges. This user should have PROCESS, REPLICATION CLIENT, and SELECT privileges.

-- Connect to your MySQL server as root or a privileged user
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'your_strong_password';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;

Next, create a .my.cnf file in the home directory of the user that will run mysqld_exporter (e.g., /home/mysql_exporter/.my.cnf or ~/.my.cnf if running as a regular user). This file should contain the credentials for the exporter user.

[client]
user=exporter
password=your_strong_password
host=localhost

Now, you can run mysqld_exporter. The most straightforward way is again using Docker.

docker run -d \
  --name mysqld_exporter \
  --network="host" \
  -v /path/to/your/.my.cnf:/etc/mysqld_exporter/.my.cnf \
  prom/mysqld-exporter:v0.15.0 \
  --config.my-cnf="/etc/mysqld_exporter/.my.cnf" \
  --collect.global_status \
  --collect.info_schema.tables \
  --collect.info_schema.table_schema \
  --collect.slave_status \
  --collect.binlog_size \
  --collect.info_schema.processlist \
  --collect.auto_increment.columns \
  --collect.info_schema.innodb_metrics \
  --collect.binlog \
  --collect.perf_schema.tablelocks \
  --collect.perf_schema.file_events \
  --collect.perf_schema.eventsstatements \
  --collect.perf_schema.eventsstatements.limit=10 \
  --collect.perf_schema.eventswaits \
  --collect.perf_schema.eventswaits.limit=10 \
  --collect.slave_master_info \
  --web.listen-address="0.0.0.0:9104"

Important Notes:

Replace /path/to/your/.my.cnf with the actual path to your credentials file.
--network="host" is used for simplicity to allow the exporter to directly access the MySQL server on the same host. If running in a different network, adjust accordingly.
The --collect.* flags enable specific metric groups. Choose those relevant to your monitoring needs. For a production MySQL cluster, collecting status, schema, replication, and performance schema metrics is highly recommended.
Ensure the port 9104 is accessible by your Prometheus server.

Visualizing Metrics with Grafana

Grafana is the de facto standard for visualizing time-series data from Prometheus. We’ll set up Grafana and connect it to our Prometheus data source.

Deploying Grafana with Docker

version: '3.7'

services:
  grafana:
    image: grafana/grafana:10.0.0 # Use a specific, stable version
    container_name: grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    restart: unless-stopped

volumes:
  grafana_data:

Save this as docker-compose.yml (or add it to your existing one) and run:

docker-compose up -d grafana

Access Grafana at http://YOUR_LINODE_IP:3000. The default credentials are admin/admin. You’ll be prompted to change the password on first login.

Configuring Prometheus Data Source in Grafana

1. In Grafana, navigate to Configuration (gear icon) > Data Sources.

2. Click Add data source.

3. Select Prometheus.

4. In the URL field, enter the address of your Prometheus server. If Grafana and Prometheus are on the same Linode and using Docker Compose, this would typically be http://prometheus:9090 (if using the service name) or http://YOUR_LINODE_IP:9090 (if using host networking).

5. Click Save & Test. You should see a “Data source is working” message.

Importing Pre-built Dashboards

Grafana has a rich community providing pre-built dashboards. For MySQL and general system metrics, these are invaluable.

MySQL Dashboard: Search for “MySQL Prometheus” on Grafana.com/dashboards. A popular one is “MySQL Overview” (Dashboard ID: 7362 or similar).
Node Exporter Dashboard: For general system metrics (CPU, RAM, Disk, Network) on your Linode instances, use a Node Exporter dashboard (e.g., ID: 1860). You’ll need to run node_exporter on each Linode instance.
Nginx Dashboard: If using nginx-exporter, find a suitable Nginx dashboard (e.g., ID: 1282).

To import a dashboard:

Go to Dashboards (four squares icon) > Import.
Paste the Dashboard ID or upload the JSON file.
Select your Prometheus data source.
Click Import.

Key Metrics to Monitor for Shopify Apps and MySQL

Beyond general system health, focus on metrics critical to your application’s performance and stability.

Shopify App Metrics

Request Latency: Track the time taken to respond to API requests from Shopify and to Shopify.
Error Rates: Monitor HTTP 5xx and 4xx errors.
Throughput: Requests per second.
Queue Lengths: If your app uses background job queues (e.g., Redis queues), monitor queue sizes.
Resource Utilization: CPU, memory, and network I/O of your application servers.

MySQL Cluster Metrics

Replication Lag: Crucial for read replicas. Use Seconds_Behind_Master from SHOW SLAVE STATUS (exposed by mysqld_exporter as mysql_slave_status_seconds_behind_master).
Connection Usage: mysql_global_status_threads_connected, mysql_global_status_threads_running. High connected threads can indicate connection leaks or insufficient pooling.
Query Performance: mysql_global_status_questions (total queries), mysql_global_status_slow_queries. Use performance schema metrics for detailed query analysis if available.
InnoDB Metrics: mysql_innodb_buffer_pool_reads vs mysql_innodb_buffer_pool_read_requests (buffer pool hit rate), mysql_innodb_row_lock_waits (lock contention).
Disk I/O: mysql_global_status_innodb_data_reads, mysql_global_status_innodb_data_writes, and corresponding latency metrics if exposed.
Temporary Tables: mysql_global_status_created_tmp_tables, mysql_global_status_created_tmp_disk_tables. A high number of disk-based temporary tables indicates inefficient queries or insufficient memory.
Replication Errors: Monitor mysql_slave_status_last_sql_error and mysql_slave_status_last_io_error.

Alerting with Prometheus Alertmanager

Proactive alerting is as important as monitoring. Prometheus Alertmanager handles alerts sent by Prometheus, deduplicates them, groups them, and routes them to the correct receiver (e.g., Slack, PagerDuty, email).

Configuring Alertmanager

Add an Alertmanager service to your `docker-compose.yml` and create a configuration file (`alertmanager.yml`).

# In your main docker-compose.yml
services:
  # ... prometheus and grafana services ...

  alertmanager:
    image: prom/alertmanager:v0.25.0 # Use a specific, stable version
    container_name: alertmanager
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager:/etc/alertmanager/ # Mount your alertmanager.yml
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
    restart: unless-stopped

# ./alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'slack-notifications' # Default receiver

  routes:
    - receiver: 'slack-notifications'
      matchers:
        - severity = "critical"
      continue: true # Allow further routing if needed

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - api_url: 'YOUR_SLACK_WEBHOOK_URL'
        channel: '#your-alerts-channel'
        send_resolved: true
        title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}'
        text: '{{ range .Alerts }}*Alert:* {{ .Annotations.summary }}\n*Description:* {{ .Annotations.description }}\n*Details:* {{ range .Labels.SortedPairs }} {{ .Name }}={{ .Value }}{{ end }}\n{{ end }}'

Update your prometheus.yml to include the Alertmanager configuration:

# In your prometheus.yml
# ... other scrape configs ...

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093'] # Or YOUR_LINODE_IP:9093 if not using service discovery

Example Prometheus Alerting Rules

Create a file (e.g., rules.yml) in your ./prometheus directory and reference it in prometheus.yml.

# In your prometheus.yml, add to the root level:
rule_files:
  - "/etc/prometheus/rules.yml" # Path inside the container

# ./prometheus/rules.yml
groups:
  - name: mysql_alerts
    rules:
      - alert: MySQLReplicationLagging
        expr: mysql_slave_status_seconds_behind_master > 60 # Lagging by more than 60 seconds
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "MySQL replication lag detected on {{ $labels.instance }}"
          description: "MySQL replica {{ $labels.instance }} is lagging behind the master by more than 60 seconds."

      - alert: HighMySQLConnections
        expr: mysql_global_status_threads_connected > 500 # Example threshold
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High number of MySQL connections on {{ $labels.instance }}"
          description: "MySQL instance {{ $labels.instance }} has {{ $value }} connected threads, exceeding the threshold."

  - name: app_alerts
    rules:
      - alert: HighAppErrorRate
        expr: sum(rate(http_requests_total{job="shopify_app_metrics", code=~"5.."} [5m])) / sum(rate(http_requests_total{job="shopify_app_metrics"}[5m])) * 100 > 5 # More than 5% 5xx errors
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High HTTP 5xx error rate for Shopify App"
          description: "Shopify App is experiencing a high rate of server errors ({{ $value | printf "%.2f" }}%)."

      - alert: AppServerHighCPU
        expr: avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) < 0.2 # Less than 20% idle CPU
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "High CPU utilization on Shopify App server {{ $labels.instance }}"
          description: "App server {{ $labels.instance }} has high CPU usage ({{ (100 - $value * 100) | printf "%.2f" }}%)."

Ensure your Prometheus container is restarted after updating its configuration to load the new rules and Alertmanager target.

Advanced Considerations and Best Practices

Service Discovery: For dynamic environments, hardcoding IPs in `prometheus.yml` is brittle. Explore Prometheus’s service discovery mechanisms (e.g., Consul, Kubernetes SD, EC2 SD) if your Linode infrastructure is managed programmatically or uses orchestration.

High Availability: For critical monitoring, consider running multiple Prometheus instances (federation or remote write to a central store like Thanos or Cortex) and multiple Alertmanager instances in a cluster.

Security: Secure your Prometheus, Grafana, and Alertmanager endpoints. Use firewalls to restrict access to only necessary IPs. Consider authentication for Grafana and Prometheus if exposed publicly.

Data Retention: Configure Prometheus’s data retention policies (`–storage.tsdb.retention.time`) to balance storage costs with the need for historical data.

Application-Specific Metrics: Instrument your Shopify app code to expose business-level metrics (e.g., number of orders processed, failed API calls to Shopify, cache hit rates) using libraries like php-prometheus-client. This provides invaluable insight into the application’s functional health.

Log Aggregation: While metrics tell you *what* is happening, logs tell you *why*. Integrate a log aggregation system (e.g., ELK stack, Loki) for comprehensive observability.

By implementing this Prometheus, Grafana, and Alertmanager stack, you establish a robust, scalable, and proactive monitoring system essential for maintaining the health and performance of your Shopify application and MySQL clusters on Linode.