Server Monitoring Best Practices: Keeping Your Shopify App and PostgreSQL Clusters Alive on Linode
Proactive PostgreSQL Monitoring with Prometheus and Grafana on Linode
Maintaining the health and performance of PostgreSQL clusters is paramount for any application, especially those with high transaction volumes like Shopify apps. Relying solely on Linode’s basic instance metrics is insufficient. We need granular, application-aware monitoring. This section details setting up Prometheus and Grafana to collect and visualize PostgreSQL-specific metrics, enabling proactive issue detection and resolution.
1. Deploying the PostgreSQL Exporter
Prometheus needs a way to scrape metrics from PostgreSQL. The standard tool for this is the postgres_exporter. We’ll deploy this as a separate service, ideally on a dedicated monitoring node or alongside Prometheus itself.
1.1. Installation and Configuration
Download the latest release of postgres_exporter. For simplicity, we’ll use a Docker container, which simplifies dependency management and deployment.
1.1.1. Docker Compose Setup
Create a docker-compose.yml file to manage the exporter and its connection to your PostgreSQL instances. Ensure your PostgreSQL instances are accessible from the Docker host running this compose file. You might need to adjust network configurations or use Linode’s private networking.
version: '3.8'
services:
postgres_exporter:
image: prometheuscommunity/postgres-exporter:latest
container_name: postgres_exporter
restart: unless-stopped
ports:
- "9187:9187" # Exporter's default port
environment:
# For each PostgreSQL instance, define a DSN.
# Format: "postgresql://user:password@host:port/database?sslmode=disable"
# It's highly recommended to use environment variables or a secrets manager for credentials.
# For demonstration, we'll use direct values, but this is NOT production-ready.
POSTGRES_EXPORTER_CONNECTION_STRING_DEFAULT: "postgresql://monitor_user:your_secure_password@your_pg_host_1:5432/your_database?sslmode=require"
POSTGRES_EXPORTER_CONNECTION_STRING_CLUSTER2: "postgresql://monitor_user:your_secure_password@your_pg_host_2:5432/your_database?sslmode=require"
# Add more CONNECTION_STRING_* for additional clusters
POSTGRES_EXPORTER_EXTEND_QUERY_PATH: "/etc/postgres_exporter/queries.yaml" # Optional: for custom queries
volumes:
- ./postgres_exporter_queries.yaml:/etc/postgres_exporter/queries.yaml # Optional: for custom queries
networks:
- monitoring_net
networks:
monitoring_net:
driver: bridge
Important Security Note: Storing credentials directly in docker-compose.yml is insecure. In production, use Docker secrets, environment files loaded by the orchestrator, or a dedicated secrets management system.
1.2. PostgreSQL User Permissions
Create a dedicated PostgreSQL user for monitoring with minimal necessary privileges. This user should have read-only access to relevant system catalogs and statistics views.
-- Connect to your PostgreSQL instance as a superuser CREATE USER monitor_user WITH PASSWORD 'your_secure_password'; -- Grant necessary privileges GRANT CONNECT ON DATABASE your_database TO monitor_user; GRANT USAGE ON SCHEMA pg_catalog TO monitor_user; GRANT SELECT ON pg_stat_database TO monitor_user; GRANT SELECT ON pg_stat_replication TO monitor_user; GRANT SELECT ON pg_stat_activity TO monitor_user; GRANT SELECT ON pg_stat_statements TO monitor_user; -- If pg_stat_statements extension is enabled GRANT SELECT ON pg_locks TO monitor_user; GRANT SELECT ON pg_settings TO monitor_user; GRANT SELECT ON pg_stat_bgwriter TO monitor_user; GRANT SELECT ON pg_stat_user_tables TO monitor_user; GRANT SELECT ON pg_stat_user_indexes TO monitor_user; GRANT SELECT ON pg_stat_database_conflicts TO monitor_user; -- For custom queries defined in queries.yaml, you might need to grant SELECT on specific tables/views. -- Example: -- GRANT SELECT ON your_application_table TO monitor_user;
If you plan to use custom queries (e.g., for application-specific performance metrics), ensure the monitor_user has SELECT privileges on those tables or views. The pg_stat_statements extension needs to be enabled in postgresql.conf and created in the database (CREATE EXTENSION pg_stat_statements;) for its metrics to be available.
2. Setting up Prometheus Server
Prometheus will be responsible for scraping metrics from the postgres_exporter and other services. We’ll configure it to discover and poll our PostgreSQL exporters.
2.1. Prometheus Configuration (prometheus.yml)
Here’s a sample prometheus.yml configuration. This assumes Prometheus is running on the same network as the Docker containers for the exporters. If Prometheus is on a separate Linode instance, adjust the targets accordingly (e.g., using Linode’s private IP addresses).
global:
scrape_interval: 15s # How frequently to scrape targets
evaluation_interval: 15s # How frequently to evaluate rules
scrape_configs:
# Scrape Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Scrape Node Exporter for host metrics
- job_name: 'node_exporter'
static_configs:
- targets:
- 'your_linode_host_1_private_ip:9100'
- 'your_linode_host_2_private_ip:9100'
# Add all your Linode instance private IPs
# Scrape PostgreSQL Exporter instances
- job_name: 'postgres_exporter'
static_configs:
- targets:
- 'your_docker_host_ip:9187' # If exporter is on a separate host
# Or if running on the same host as Prometheus and using host networking:
# - 'localhost:9187'
# If using docker-compose on the same host, and ports are mapped:
# - 'localhost:9187' # For the default connection string
# - 'localhost:9188' # If you mapped POSTGRES_EXPORTER_PORT=9188 for cluster2
# It's better to use service discovery or explicitly list targets if they are on different hosts.
# Example for multiple exporters on different hosts:
# - 'pg_exporter_host_1:9187'
# - 'pg_exporter_host_2:9187'
# Example using service discovery (if you have a service registry like Consul)
# - job_name: 'postgres_exporter_sd'
# consul_sd_configs:
# - server: 'consul.service.consul:8500'
# relabel_configs:
# - source_labels: [__meta_consul_tags]
# regex: postgresql
# action: keep
# - source_labels: [__address__]
# regex: '(.*):9187'
# target_label: instance
# - source_labels: [__meta_consul_service_id]
# target_label: service_id
# Alerting rules (optional but recommended)
rule_files:
- "rules/*.yml"
Note on Targets: The targets in static_configs should point to the IP address and port where the postgres_exporter is accessible. If running Prometheus and the exporter in Docker on the same host, and you’ve mapped the port (e.g., 9187:9187), then localhost:9187 is usually correct. If they are on different Linode instances, use the private IP of the instance running the exporter.
2.2. Installing and Running Prometheus
You can install Prometheus directly on a Linode instance or run it in Docker. Using Docker Compose is often the easiest way to manage Prometheus and its configuration.
# docker-compose.yml for Prometheus
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
networks:
- monitoring_net
volumes:
prometheus_data:
networks:
monitoring_net:
driver: bridge
Place your prometheus.yml file in a directory named prometheus alongside this docker-compose.yml. Then run: docker-compose up -d.
3. Visualizing Metrics with Grafana
Grafana provides a powerful and flexible dashboarding solution. We’ll connect it to our Prometheus data source and import pre-built PostgreSQL dashboards.
3.1. Deploying Grafana
Again, Docker Compose is a convenient way to deploy Grafana.
version: '3.8'
services:
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
networks:
- monitoring_net
networks:
monitoring_net:
driver: bridge
Run: docker-compose up -d. Access Grafana at http://your_linode_ip:3000. Default credentials are admin/admin (you’ll be prompted to change the password on first login).
3.2. Configuring Prometheus Data Source in Grafana
1. Log in to Grafana.
- Navigate to Configuration (gear icon) > Data Sources.
- Click Add data source.
- Select Prometheus.
- In the URL field, enter the address of your Prometheus server (e.g.,
http://your_prometheus_host_ip:9090orhttp://localhost:9090if Grafana and Prometheus are on the same host/network). - Click Save & Test. You should see “Data source is working”.
3.3. Importing PostgreSQL Dashboards
Grafana has a rich community dashboard repository. We can import pre-built dashboards for PostgreSQL.
- Go to Dashboards (four squares icon) > Browse.
- Click Import.
- You can import by Dashboard ID from grafana.com/grafana/dashboards/. Search for “PostgreSQL” and find popular ones like “PostgreSQL Overview” (ID: 1222) or “PostgreSQL by Percona” (ID: 721).
- Alternatively, if you have a dashboard JSON file, you can upload it.
- When prompted, select your Prometheus data source.
- Click Import.
These dashboards will visualize key PostgreSQL metrics such as:
- Connection counts
- Replication lag
- Query performance (if
pg_stat_statementsis enabled) - Cache hit ratios
- Disk I/O
- Transaction rates
- Lock contention
4. Setting Up Alerting Rules
Dashboards are great for visualization, but alerts are crucial for proactive intervention. Prometheus Alertmanager handles this.
4.1. Defining Alerting Rules
Create a file (e.g., rules/postgres_alerts.yml) and add rules to your Prometheus configuration.
groups:
- name: postgresql.rules
rules:
- alert: PostgreSQLHighReplicationLag
expr: |
pg_replication_lag_seconds{job="postgres_exporter"} > 60 # Lag greater than 60 seconds
for: 5m
labels:
severity: critical
annotations:
summary: "High replication lag detected on {{ $labels.instance }}"
description: "PostgreSQL instance {{ $labels.instance }} has a replication lag of {{ $value }} seconds, exceeding the 60-second threshold for 5 minutes."
- alert: PostgreSQLTooManyConnections
expr: |
pg_stat_activity_count{job="postgres_exporter"} > 100 # Example threshold, adjust based on your max_connections
for: 10m
labels:
severity: warning
annotations:
summary: "High number of PostgreSQL connections on {{ $labels.instance }}"
description: "PostgreSQL instance {{ $labels.instance }} has {{ $value }} active connections, approaching the configured limit."
- alert: PostgreSQLDeadlocks
expr: |
rate(pg_stat_database_deadlocks{job="postgres_exporter"}[5m]) > 0
for: 1m
labels:
severity: critical
annotations:
summary: "Deadlock detected in PostgreSQL on {{ $labels.instance }}"
description: "A deadlock has occurred on PostgreSQL instance {{ $labels.instance }} within the last 5 minutes."
- alert: PostgreSQLHighLockWaitTime
expr: |
rate(pg_stat_database_blk_read_time_seconds{job="postgres_exporter"}[5m]) > 300 # Example: total block read time > 5 minutes in 5 min interval
for: 5m
labels:
severity: warning
annotations:
summary: "High lock wait time on {{ $labels.instance }}"
description: "PostgreSQL instance {{ $labels.instance }} is experiencing significant lock wait times."
# Add more rules for cache hit ratio, disk space, etc.
Ensure your prometheus.yml includes the rule_files directive pointing to this file.
4.2. Configuring Alertmanager
Alertmanager receives alerts from Prometheus and routes them to various receivers (email, Slack, PagerDuty, etc.).
# alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver' # Default receiver if no specific match
routes:
- receiver: 'slack-notifications'
match:
severity: 'critical'
continue: true # Allows matching other routes if needed
receivers:
- name: 'default-receiver'
webhook_configs:
- url: 'http://your-webhook-receiver:5001' # Example: a custom webhook
- name: 'slack-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX' # Replace with your Slack webhook URL
channel: '#alerts'
send_resolved: true
text: "{{ range .Alerts }}*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`\n*Description:* {{ .Annotations.description }}\n*Details:* {{ range .Labels.SortedPairs }} `{{ .Name }}={{ .Value }}` {{ end }}\n{{ end }}"
You’ll need to run Alertmanager, typically via Docker Compose, and configure Prometheus to send alerts to it.
# docker-compose.yml for Alertmanager
version: '3.8'
services:
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager:/etc/alertmanager/
- alertmanager_data:/data
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.tsdb.path=/data'
networks:
- monitoring_net
volumes:
alertmanager_data:
networks:
monitoring_net:
driver: bridge
Update your prometheus.yml to include the Alertmanager configuration:
# ... (previous prometheus.yml content) ...
alerting:
alertmanagers:
- static_configs:
- targets:
- 'alertmanager:9093' # If Alertmanager is in the same Docker network
# - 'your_alertmanager_host_ip:9093' # If on a different host
5. Monitoring Your Shopify App (Node.js/Ruby/PHP)
Beyond the database, your application instances themselves need monitoring. This involves application performance monitoring (APM) and infrastructure-level metrics.
5.1. Node Exporter for System Metrics
As shown in the Prometheus config, node_exporter is essential. Install it on every Linode instance running your Shopify app. It exposes hardware and OS metrics like CPU, memory, disk I/O, and network traffic.
# On each application Linode instance: wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz cd node_exporter-1.7.0.linux-amd64 sudo mv node_exporter /usr/local/bin/ sudo useradd -rs /bin/false node_exporter # Create a systemd service file (/etc/systemd/system/node_exporter.service) [Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=node_exporter Group=node_exporter Type=simple ExecStart=/usr/local/bin/node_exporter \ --collector.filesystem.mount-points-exclude='^/(sys|proc|dev|host|etc)($$|/)' \ --collector.netdev.sample-interval=10s \ --collector.diskstats.sample-interval=10s \ --collector.tcpstat.sample-interval=10s \ --collector.textfile.directory=/var/lib/node_exporter/textfile_collector [Install] WantedBy=multi-user.target # Enable and start sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter sudo systemctl status node_exporter
Ensure the Linode’s firewall allows access to port 9100 from your Prometheus server’s IP address.
5.2. Application-Specific Metrics (Examples)
This is highly dependent on your app’s stack.
5.2.1. Node.js Apps
Use libraries like prom-client to expose custom metrics via an HTTP endpoint (e.g., /metrics).
// Example using prom-client for Node.js
const express = require('express');
const client = require('prom-client');
const app = express();
const register = new client.Registry();
// Enable default metrics
client.collectDefaultMetrics({ register });
// Custom metric: Number of Shopify API calls
const shopifyApiCallCounter = new client.Counter({
name: 'shopify_api_calls_total',
help: 'Total number of Shopify API calls made',
labelNames: ['endpoint', 'method'],
register,
});
// Middleware to increment counter for API calls
app.use((req, res, next) => {
if (req.path.startsWith('/api/shopify')) { // Example path
shopifyApiCallCounter.labels(req.path, req.method).inc();
}
next();
});
// Endpoint to expose metrics
app.get('/metrics', async (req, res) => {
res.setHeader('Content-Type', register.contentType);
res.end(await register.metrics());
});
// Your application routes here...
app.get('/', (req, res) => {
res.send('Hello World!');
});
// Start the metrics server (or integrate with your existing server)
const PORT = 8080; // Or any other port
app.listen(PORT, () => {
console.log(`Metrics server listening on port ${PORT}`);
});
// In prometheus.yml, add a job for this:
// - job_name: 'my_shopify_app_nodejs'
// static_configs:
// - targets: ['your_app_host_private_ip:8080']
5.2.2. Ruby on Rails Apps
Use the prometheus_client gem.
# Gemfile
gem 'prometheus_client'
# Initialize Prometheus client (e.g., in config/initializers/prometheus.rb)
require 'prometheus_client'
require 'prometheus_client/middleware'
PrometheusClient.configure do |config|
config.redis = { url: ENV['REDIS_URL'] } # Or use memory store
config. للاستخدام_client = true
end
# Register custom metrics
$shopify_api_calls = PrometheusClient::Counter.new(
name: 'shopify_api_calls_total',
docstring: 'Total number of Shopify API calls made',
labels: [:endpoint, :method]
)
# Add middleware to Rails application (config/application.rb or config/environments/*.rb)
config.middleware.use PrometheusClient::Middleware,
metrics_path: '/metrics',
registry: PrometheusClient.registry
# In a controller or service object, increment the counter:
# $shopify_api_calls.increment(endpoint: '/admin/api/2023-10/orders.json', method: 'GET')
# In prometheus.yml, add a job for this:
# - job_name: 'my_shopify_app_rails'
# static_configs:
# - targets: ['your_app_host_private_ip:3000'] # Assuming Rails default port
5.2.3. PHP Apps (e.g., Laravel)
Use libraries like prometheus_client_php.
// composer.json
// "require": { "promphp/prometheus_client": "^1.0" }
// In a service provider or bootstrap file (e.g., app/Providers/AppServiceProvider.php)
use Prometheus\Storage\InMemory;
use Prometheus\CollectorRegistry;
use Prometheus\RenderTextFormat;
// Initialize registry
$adapter = new InMemory();
$registry = new CollectorRegistry($adapter);
// Custom metric: Number of Shopify API calls
$counter = $registry->registerCounter(
'my_app', // Namespace
'shopify_api_calls_total',
'Total number of Shopify API calls made',
['endpoint', 'method'] // Labels
);
// Example usage in a controller or service
public function callShopifyApi() {
// ... API call logic ...
$endpoint = '/admin/api/2023-10/orders.json';
$method = 'POST';
$counter->incBy(1, [$endpoint, $method]);
// ...
}
// Create a route to expose metrics (e.g., routes/web.php)
Route::get('/metrics', function () use ($registry) {
$renderer = new RenderTextFormat($registry);
return response($renderer->render(), 200)
->header('Content-Type', RenderTextFormat::MIME_TYPE);
});
// In prometheus.yml, add a job for this:
// - job_name: 'my_shopify_app_php'
// static_configs:
// - targets: ['your_app_host_private_ip:80'] # Assuming web server port
5.3. APM Tools
For deeper insights into request tracing, error tracking, and performance bottlenecks within your application code, consider dedicated APM tools. Many integrate with Prometheus or offer their own dashboards:
- New Relic
- Datadog
- Sentry (primarily error tracking, but has performance monitoring)
- OpenTelemetry (an open-source standard for instrumentation, can send data to various backends including Prometheus)
Instrument your application code with the chosen APM agent. Configure it to send data to its respective backend. You can then correlate APM data with Prometheus metrics in Grafana for a holistic view.
6. Linode Specific Considerations
6.1. Network Configuration
Use Linode’s private networking feature to allow your Prometheus, Grafana, and Alertmanager instances to communicate securely and efficiently with your application and database servers without exposing them to the public internet. Ensure your Linode firewall rules are configured to allow traffic only from necessary sources (e.g., Prometheus scraping your app’s metrics endpoint only from the Prometheus server’s private IP).
6.2. Resource Allocation
Monitoring infrastructure (Prometheus, Grafana, Alertmanager) consumes resources. Allocate adequate CPU, RAM, and disk space for these services, especially if you have a large number of targets or a long retention period for metrics. Consider using dedicated Linode instances for your monitoring stack if resource contention becomes an issue.
6.3. High Availability for Monitoring
For critical applications, a single point of failure in your monitoring system is unacceptable. Consider:
- Running Prometheus in a high-availability setup (federation or Thanos/Cortex).
- Deploying multiple Grafana instances behind a load balancer.
- Configuring Alertmanager with replica sets.
This adds complexity but ensures your monitoring remains operational even if one component fails.
Conclusion
Implementing a robust monitoring strategy using Prometheus and Grafana is essential for maintaining the stability and performance of your Shopify app and its underlying PostgreSQL clusters on Linode. By combining system-level metrics, application-specific instrumentation, and proactive alerting, you can detect and resolve issues before they impact your users, ensuring a seamless e-commerce experience.