Server Monitoring Best Practices: Keeping Your WooCommerce App and MongoDB Clusters Alive on DigitalOcean
Proactive MongoDB Cluster Health Checks with `mongosh` and Custom Scripts
Maintaining the health of a distributed MongoDB cluster, especially one powering a high-traffic WooCommerce store, requires more than just basic uptime checks. We need to delve into the operational metrics that indicate potential performance degradation or impending failures. This involves leveraging the `mongosh` shell and scripting custom checks for key indicators.
A fundamental check is the status of replica set members. We want to ensure all members are in a healthy state and that the primary is stable. We can achieve this by running a script that connects to the primary and queries the replica set status.
Replica Set Status Script
This Python script uses `pymongo` to connect to the MongoDB replica set and retrieve its status. It then parses the output to identify any non-PRIMARY members or members in a state other than `STARTUP2`, `SECONDARY`, or `ARBITER` (depending on your configuration). Alerts should be triggered for any anomalies.
First, ensure you have `pymongo` installed:
pip install pymongo
Next, create the Python script (e.g., `check_mongo_rs.py`):
import pymongo import sys import logging # Configure logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') # --- Configuration --- # Replace with your MongoDB connection string or host/port MONGO_URI = "mongodb://user:[email protected]:27017,mongo2.example.com:27017,mongo3.example.com:27017/?replicaSet=myReplicaSet&authSource=admin" # --- End Configuration --- def check_replica_set_health(mongo_uri): client = None try: client = pymongo.MongoClient(mongo_uri, serverSelectionTimeoutMS=5000) # 5 second timeout # The ismaster command is cheap and does not require auth. client.admin.command('ismaster') logging.info("Successfully connected to MongoDB.") rs_status = client.admin.command('replSetGetStatus') primary_member = None secondary_members = [] other_members = [] for member in rs_status['members']: state_str = member['stateStr'] member_id = member['_id'] host = member['name'] if state_str == 'PRIMARY': primary_member = member elif state_str == 'SECONDARY': secondary_members.append(member) else: other_members.append(member) if not primary_member: logging.error("No PRIMARY member found in the replica set!") return False logging.info(f"Primary: {primary_member['name']} (State: {primary_member['stateStr']})") if len(secondary_members) + (1 if primary_member else 0) + len(other_members) != rs_status['members'].__len__(): logging.warning("Member count mismatch in status output. This is unusual.") # Check for lag on secondaries (optional but recommended) # This requires a recent oplog entry on the primary try: oplog_tail = client.local.oplog.rs.find().sort('$natural', pymongo.DESCENDING).limit(1)[0]['ts'] for member in secondary_members: if member['optimeDate'] < oplog_tail - pymongo.timelib.timedelta(seconds=60): # 60 second lag threshold logging.warning(f"Secondary {member['name']} is lagging. Optime: {member['optimeDate']}, Primary Oplog Tail: {oplog_tail}") except Exception as e: logging.warning(f"Could not check for oplog lag: {e}") # Check for unhealthy states in other members for member in other_members: if member['stateStr'] not in ['STARTUP2', 'SECONDARY', 'ARBITER', 'DOWN', 'ROLLBACK', 'REMOVED', 'UNKNOWN']: # Adjust allowed states as needed logging.error(f"Member {member['name']} is in an unhealthy state: {member['stateStr']}") return False elif member['stateStr'] == 'DOWN': logging.warning(f"Member {member['name']} is DOWN.") # Depending on your tolerance, you might want to alert on DOWN members. if not primary_member: logging.error("Replica set has no primary.") return False return True except pymongo.errors.ConnectionFailure as e: logging.error(f"Could not connect to MongoDB: {e}") return False except pymongo.errors.OperationFailure as e: logging.error(f"MongoDB operation failed: {e}") return False except Exception as e: logging.error(f"An unexpected error occurred: {e}") return False finally: if client: client.close() if __name__ == "__main__": if check_replica_set_health(MONGO_URI): logging.info("MongoDB replica set health check passed.") sys.exit(0) else: logging.error("MongoDB replica set health check failed.") sys.exit(1)
To automate this, you can use `cron` on a dedicated monitoring server or one of your application servers (if it has network access to the MongoDB cluster). Schedule this script to run every 1-5 minutes.
Cron Job Example
Edit your crontab:
crontab -e
Add a line like this to run the script every 5 minutes and redirect output to a log file:
*/5 * * * * /usr/bin/python3 /path/to/your/scripts/check_mongo_rs.py >> /var/log/mongo_health_check.log 2>&1
You'll then need to set up a separate monitoring system (like Prometheus Alertmanager, Nagios, or even a simple `logwatch` configuration) to parse `/var/log/mongo_health_check.log` for error messages and trigger alerts.
WooCommerce Application Performance Monitoring (APM) with New Relic
For the WooCommerce application itself, understanding performance bottlenecks is crucial. This involves monitoring request latency, error rates, database query times, and external service calls. New Relic is a powerful APM tool that provides deep insights into your PHP application's performance.
New Relic PHP Agent Installation and Configuration
The first step is to install the New Relic PHP agent. This typically involves downloading the agent installer script and running it on your web servers.
wget https://download.newrelic.com/php/newrelic-php5-9.17.0.290-x64.tar.gz tar -zxvf newrelic-php5-9.17.0.290-x64.tar.gz cd newrelic-php5-9.17.0.290-x64 ./configure --license-key=YOUR_LICENSE_KEY --enable-php-binaries=/usr/bin/php make sudo make install
After installation, you need to enable the agent in your `php.ini` file. The installer usually provides instructions on which `php.ini` file to modify (e.g., `php.ini` for CLI, `php.ini` for FPM). You'll need to add or uncomment the following lines:
[newrelic] enabled = true license = "YOUR_LICENSE_KEY" appname = "YourWooCommerceAppName" ; Optional: Specify log level for agent logs ; log_level = info ; Optional: Specify log file path ; log_file = /var/log/newrelic/newrelic-php5.log
Restart your web server (e.g., Nginx/Apache) and PHP-FPM service for the changes to take effect.
sudo systemctl restart nginx sudo systemctl restart php8.1-fpm # Adjust PHP version as needed
Key WooCommerce Metrics to Monitor in New Relic
Once the agent is active, New Relic will start collecting data. Focus on these critical metrics within the New Relic UI:
- Transaction Traces: Identify slow-loading pages, API endpoints, or background processes. Look for traces with high "Time spent in other code" or excessive database query times.
- Database Queries: Monitor the slowest and most frequent database queries. Inefficient queries (e.g., missing indexes, N+1 problems) are common performance killers in WooCommerce.
- External Services: Track the performance of calls to third-party APIs (payment gateways, shipping providers, etc.). High latency or error rates here can directly impact user experience.
- Error Rates: Monitor PHP errors, exceptions, and HTTP error codes (5xx). Set up alerts for spikes in error rates.
- Throughput: Observe the number of transactions per minute. Sudden drops can indicate an outage or severe performance issue.
- Response Time: Track the average response time for your web transactions.
For WooCommerce specifically, pay close attention to transactions related to product pages, cart operations, checkout, and order processing. Use New Relic's custom instrumentation if needed to pinpoint specific plugin or theme performance issues.
DigitalOcean Droplet and Load Balancer Health Monitoring
Beyond the application and database, the underlying infrastructure on DigitalOcean needs constant vigilance. This includes Droplets (your servers) and Load Balancers.
Droplet Resource Utilization
DigitalOcean provides basic metrics (CPU, RAM, Disk I/O, Network) through its API and control panel. For more granular control and alerting, integrate with a dedicated monitoring solution like Prometheus and Grafana.
Prometheus Node Exporter: Install `node_exporter` on each Droplet to expose system-level metrics.
# Download and extract node_exporter wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz cd node_exporter-1.7.0.linux-amd64 # Run node_exporter (consider setting up as a systemd service for production) ./node_exporter --web.listen-address=":9100"
Prometheus Configuration: Configure Prometheus to scrape metrics from your Droplets.
# prometheus.yml
scrape_configs:
- job_name: 'node_exporter'
static_configs:
- targets: ['droplet1.example.com:9100', 'droplet2.example.com:9100', 'droplet3.example.com:9100']
Grafana Dashboards: Use pre-built Grafana dashboards (e.g., Node Exporter Full dashboard) to visualize CPU, memory, disk, and network usage. Set up alerting rules in Grafana or Prometheus Alertmanager for thresholds like:
- CPU Usage > 90% for 5 minutes
- Memory Usage > 90% for 5 minutes
- Disk Usage > 95%
- High I/O Wait times
DigitalOcean Load Balancer Health Checks
DigitalOcean Load Balancers have built-in health checks. Ensure these are configured correctly to direct traffic only to healthy Droplets. The key parameters to monitor and configure are:
- Protocol: Typically HTTP or HTTPS.
- Port: The port your application listens on (e.g., 80 or 443).
- Path: A specific URL path to check for a healthy response. A common practice is to create a simple health check endpoint in your WooCommerce application (e.g., `/healthz` or `/status`).
- Interval: How often to check.
- Timeout: How long to wait for a response.
- Unhealthy Threshold: Number of consecutive failed checks before marking a Droplet as unhealthy.
- Healthy Threshold: Number of consecutive successful checks before marking a Droplet as healthy again.
Custom Health Check Endpoint Example (PHP):
<?php
// healthcheck.php
header('Content-Type: application/json');
// Basic check: is the database accessible?
$db_connected = false;
try {
// Replace with your actual DB connection logic
// For example, using PDO or mysqli
// $pdo = new PDO('mysql:host=localhost;dbname=your_db', 'user', 'password');
// $db_connected = true;
// For demonstration, assume it's connected if no exception
$db_connected = true;
} catch (PDOException $e) {
// Log the error: error_log("DB Connection Error: " . $e->getMessage());
$db_connected = false;
}
// Add more checks as needed (e.g., Redis connection, external API status)
if ($db_connected) {
http_response_code(200);
echo json_encode(['status' => 'ok', 'database' => 'connected']);
} else {
http_response_code(503); // Service Unavailable
echo json_encode(['status' => 'error', 'database' => 'disconnected']);
}
exit;
?>
Ensure this `healthcheck.php` file is accessible via your web server and configured in the Load Balancer's health check settings. Monitor the Load Balancer's own metrics in DigitalOcean for dropped connections or unhealthy backend counts.
Log Aggregation and Analysis with ELK Stack (Elasticsearch, Logstash, Kibana)
Centralized logging is indispensable for diagnosing issues across distributed systems. The ELK stack (or its managed equivalent like Elastic Cloud) provides a robust solution for collecting, storing, and analyzing logs from your WooCommerce application, MongoDB, and Droplets.
Logstash Configuration for WooCommerce and MongoDB
You'll need Logstash agents running on your application servers and database servers (or a dedicated log shipper like Filebeat). Configure Logstash to ingest logs from various sources.
Logstash Input for Nginx/Apache Access & Error Logs:
# /etc/logstash/conf.d/nginx.conf
input {
file {
path => "/var/log/nginx/*.log"
start_position => "beginning"
sincedb_path => "/dev/null" # Use Filebeat's sincedb if using Filebeat
type => "nginx"
}
}
Logstash Input for MongoDB Logs:
# /etc/logstash/conf.d/mongodb.conf
input {
file {
path => "/var/log/mongodb/mongod.log"
start_position => "beginning"
sincedb_path => "/dev/null" # Use Filebeat's sincedb if using Filebeat
type => "mongodb"
}
}
Logstash Filter for Nginx (Grok Pattern):
filter {
if [type] == "nginx" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
# Add GeoIP lookup for client IP if desired
# geoip { source => "clientip" }
}
}
Logstash Filter for MongoDB:
filter {
if [type] == "mongodb" {
# Basic grok for common MongoDB log lines
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} %{GREEDYDATA:message}" }
}
# More specific patterns for different MongoDB log message types can be added here
}
}
Logstash Output to Elasticsearch:
output {
elasticsearch {
hosts => ["http://elasticsearch-host:9200"]
index => "%{type}-%{+YYYY.MM.dd}"
}
}
Kibana for Analysis and Alerting
In Kibana, create dashboards to visualize key log data. For example:
- Nginx 5xx error rates over time.
- MongoDB slow query logs (if configured to log).
- Application error counts.
- Droplet resource metrics (if using Metricbeat).
Use Kibana's alerting features (or integrate with Elasticsearch Watcher/Alerting) to trigger notifications based on log patterns, such as repeated application errors, specific MongoDB error messages, or unusual Nginx status codes.
Conclusion: A Multi-Layered Approach
Effective server monitoring for a critical WooCommerce application on DigitalOcean is not a single tool or technique. It requires a multi-layered strategy encompassing:
- Database Health: Proactive checks on MongoDB replica set status, replication lag, and resource utilization.
- Application Performance: Deep insights into WooCommerce request latency, errors, and dependencies using APM tools like New Relic.
- Infrastructure Health: Monitoring Droplet resources (CPU, RAM, Disk, Network) and Load Balancer health.
- Log Aggregation: Centralized collection and analysis of logs from all components to quickly diagnose issues.
By implementing these practices, you move from reactive firefighting to proactive system management, ensuring the stability and performance of your WooCommerce store.