Server Monitoring Best Practices: Keeping Your WooCommerce App and MongoDB Clusters Alive on DigitalOcean

Proactive MongoDB Cluster Health Checks with `mongosh` and Custom Scripts

Maintaining the health of a distributed MongoDB cluster, especially one powering a high-traffic WooCommerce store, requires more than just basic uptime checks. We need to delve into the operational metrics that indicate potential performance degradation or impending failures. This involves leveraging the `mongosh` shell and scripting custom checks for key indicators.

A fundamental check is the status of replica set members. We want to ensure all members are in a healthy state and that the primary is stable. We can achieve this by running a script that connects to the primary and queries the replica set status.

Replica Set Status Script

This Python script uses `pymongo` to connect to the MongoDB replica set and retrieve its status. It then parses the output to identify any non-PRIMARY members or members in a state other than `STARTUP2`, `SECONDARY`, or `ARBITER` (depending on your configuration). Alerts should be triggered for any anomalies.

First, ensure you have `pymongo` installed:

pip install pymongo

Next, create the Python script (e.g., `check_mongo_rs.py`):

import pymongo
import sys
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# --- Configuration ---
# Replace with your MongoDB connection string or host/port
MONGO_URI = "mongodb://user:[email protected]:27017,mongo2.example.com:27017,mongo3.example.com:27017/?replicaSet=myReplicaSet&authSource=admin"
# --- End Configuration ---

def check_replica_set_health(mongo_uri):
    client = None
    try:
        client = pymongo.MongoClient(mongo_uri, serverSelectionTimeoutMS=5000) # 5 second timeout
        # The ismaster command is cheap and does not require auth.
        client.admin.command('ismaster')
        logging.info("Successfully connected to MongoDB.")

        rs_status = client.admin.command('replSetGetStatus')
        
        primary_member = None
        secondary_members = []
        other_members = []

        for member in rs_status['members']:
            state_str = member['stateStr']
            member_id = member['_id']
            host = member['name']

            if state_str == 'PRIMARY':
                primary_member = member
            elif state_str == 'SECONDARY':
                secondary_members.append(member)
            else:
                other_members.append(member)

        if not primary_member:
            logging.error("No PRIMARY member found in the replica set!")
            return False

        logging.info(f"Primary: {primary_member['name']} (State: {primary_member['stateStr']})")

        if len(secondary_members) + (1 if primary_member else 0) + len(other_members) != rs_status['members'].__len__():
             logging.warning("Member count mismatch in status output. This is unusual.")

        # Check for lag on secondaries (optional but recommended)
        # This requires a recent oplog entry on the primary
        try:
            oplog_tail = client.local.oplog.rs.find().sort('$natural', pymongo.DESCENDING).limit(1)[0]['ts']
            for member in secondary_members:
                if member['optimeDate'] < oplog_tail - pymongo.timelib.timedelta(seconds=60): # 60 second lag threshold
                    logging.warning(f"Secondary {member['name']} is lagging. Optime: {member['optimeDate']}, Primary Oplog Tail: {oplog_tail}")
        except Exception as e:
            logging.warning(f"Could not check for oplog lag: {e}")

        # Check for unhealthy states in other members
        for member in other_members:
            if member['stateStr'] not in ['STARTUP2', 'SECONDARY', 'ARBITER', 'DOWN', 'ROLLBACK', 'REMOVED', 'UNKNOWN']: # Adjust allowed states as needed
                logging.error(f"Member {member['name']} is in an unhealthy state: {member['stateStr']}")
                return False
            elif member['stateStr'] == 'DOWN':
                logging.warning(f"Member {member['name']} is DOWN.")
                # Depending on your tolerance, you might want to alert on DOWN members.

        if not primary_member:
            logging.error("Replica set has no primary.")
            return False

        return True

    except pymongo.errors.ConnectionFailure as e:
        logging.error(f"Could not connect to MongoDB: {e}")
        return False
    except pymongo.errors.OperationFailure as e:
        logging.error(f"MongoDB operation failed: {e}")
        return False
    except Exception as e:
        logging.error(f"An unexpected error occurred: {e}")
        return False
    finally:
        if client:
            client.close()

if __name__ == "__main__":
    if check_replica_set_health(MONGO_URI):
        logging.info("MongoDB replica set health check passed.")
        sys.exit(0)
    else:
        logging.error("MongoDB replica set health check failed.")
        sys.exit(1)

To automate this, you can use `cron` on a dedicated monitoring server or one of your application servers (if it has network access to the MongoDB cluster). Schedule this script to run every 1-5 minutes.

Cron Job Example

Edit your crontab:

crontab -e

Add a line like this to run the script every 5 minutes and redirect output to a log file:

*/5 * * * * /usr/bin/python3 /path/to/your/scripts/check_mongo_rs.py >> /var/log/mongo_health_check.log 2>&1

You'll then need to set up a separate monitoring system (like Prometheus Alertmanager, Nagios, or even a simple `logwatch` configuration) to parse `/var/log/mongo_health_check.log` for error messages and trigger alerts.

WooCommerce Application Performance Monitoring (APM) with New Relic

For the WooCommerce application itself, understanding performance bottlenecks is crucial. This involves monitoring request latency, error rates, database query times, and external service calls. New Relic is a powerful APM tool that provides deep insights into your PHP application's performance.

New Relic PHP Agent Installation and Configuration

The first step is to install the New Relic PHP agent. This typically involves downloading the agent installer script and running it on your web servers.

wget https://download.newrelic.com/php/newrelic-php5-9.17.0.290-x64.tar.gz
tar -zxvf newrelic-php5-9.17.0.290-x64.tar.gz
cd newrelic-php5-9.17.0.290-x64
./configure --license-key=YOUR_LICENSE_KEY --enable-php-binaries=/usr/bin/php
make
sudo make install

After installation, you need to enable the agent in your `php.ini` file. The installer usually provides instructions on which `php.ini` file to modify (e.g., `php.ini` for CLI, `php.ini` for FPM). You'll need to add or uncomment the following lines:

[newrelic]
enabled = true
license = "YOUR_LICENSE_KEY"
appname = "YourWooCommerceAppName"
; Optional: Specify log level for agent logs
; log_level = info
; Optional: Specify log file path
; log_file = /var/log/newrelic/newrelic-php5.log

Restart your web server (e.g., Nginx/Apache) and PHP-FPM service for the changes to take effect.

sudo systemctl restart nginx
sudo systemctl restart php8.1-fpm  # Adjust PHP version as needed

Key WooCommerce Metrics to Monitor in New Relic

Once the agent is active, New Relic will start collecting data. Focus on these critical metrics within the New Relic UI:

Transaction Traces: Identify slow-loading pages, API endpoints, or background processes. Look for traces with high "Time spent in other code" or excessive database query times.
Database Queries: Monitor the slowest and most frequent database queries. Inefficient queries (e.g., missing indexes, N+1 problems) are common performance killers in WooCommerce.
External Services: Track the performance of calls to third-party APIs (payment gateways, shipping providers, etc.). High latency or error rates here can directly impact user experience.
Error Rates: Monitor PHP errors, exceptions, and HTTP error codes (5xx). Set up alerts for spikes in error rates.
Throughput: Observe the number of transactions per minute. Sudden drops can indicate an outage or severe performance issue.
Response Time: Track the average response time for your web transactions.

For WooCommerce specifically, pay close attention to transactions related to product pages, cart operations, checkout, and order processing. Use New Relic's custom instrumentation if needed to pinpoint specific plugin or theme performance issues.

DigitalOcean Droplet and Load Balancer Health Monitoring

Beyond the application and database, the underlying infrastructure on DigitalOcean needs constant vigilance. This includes Droplets (your servers) and Load Balancers.

Droplet Resource Utilization

DigitalOcean provides basic metrics (CPU, RAM, Disk I/O, Network) through its API and control panel. For more granular control and alerting, integrate with a dedicated monitoring solution like Prometheus and Grafana.

Prometheus Node Exporter: Install `node_exporter` on each Droplet to expose system-level metrics.

# Download and extract node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
cd node_exporter-1.7.0.linux-amd64

# Run node_exporter (consider setting up as a systemd service for production)
./node_exporter --web.listen-address=":9100"

Prometheus Configuration: Configure Prometheus to scrape metrics from your Droplets.

# prometheus.yml
scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['droplet1.example.com:9100', 'droplet2.example.com:9100', 'droplet3.example.com:9100']

Grafana Dashboards: Use pre-built Grafana dashboards (e.g., Node Exporter Full dashboard) to visualize CPU, memory, disk, and network usage. Set up alerting rules in Grafana or Prometheus Alertmanager for thresholds like:

CPU Usage > 90% for 5 minutes
Memory Usage > 90% for 5 minutes
Disk Usage > 95%
High I/O Wait times

DigitalOcean Load Balancer Health Checks

DigitalOcean Load Balancers have built-in health checks. Ensure these are configured correctly to direct traffic only to healthy Droplets. The key parameters to monitor and configure are:

Protocol: Typically HTTP or HTTPS.
Port: The port your application listens on (e.g., 80 or 443).
Path: A specific URL path to check for a healthy response. A common practice is to create a simple health check endpoint in your WooCommerce application (e.g., `/healthz` or `/status`).
Interval: How often to check.
Timeout: How long to wait for a response.
Unhealthy Threshold: Number of consecutive failed checks before marking a Droplet as unhealthy.
Healthy Threshold: Number of consecutive successful checks before marking a Droplet as healthy again.

Custom Health Check Endpoint Example (PHP):

<?php
// healthcheck.php
header('Content-Type: application/json');

// Basic check: is the database accessible?
$db_connected = false;
try {
    // Replace with your actual DB connection logic
    // For example, using PDO or mysqli
    // $pdo = new PDO('mysql:host=localhost;dbname=your_db', 'user', 'password');
    // $db_connected = true;
    // For demonstration, assume it's connected if no exception
    $db_connected = true; 
} catch (PDOException $e) {
    // Log the error: error_log("DB Connection Error: " . $e->getMessage());
    $db_connected = false;
}

// Add more checks as needed (e.g., Redis connection, external API status)

if ($db_connected) {
    http_response_code(200);
    echo json_encode(['status' => 'ok', 'database' => 'connected']);
} else {
    http_response_code(503); // Service Unavailable
    echo json_encode(['status' => 'error', 'database' => 'disconnected']);
}
exit;
?>

Ensure this `healthcheck.php` file is accessible via your web server and configured in the Load Balancer's health check settings. Monitor the Load Balancer's own metrics in DigitalOcean for dropped connections or unhealthy backend counts.

Log Aggregation and Analysis with ELK Stack (Elasticsearch, Logstash, Kibana)

Centralized logging is indispensable for diagnosing issues across distributed systems. The ELK stack (or its managed equivalent like Elastic Cloud) provides a robust solution for collecting, storing, and analyzing logs from your WooCommerce application, MongoDB, and Droplets.

Logstash Configuration for WooCommerce and MongoDB

You'll need Logstash agents running on your application servers and database servers (or a dedicated log shipper like Filebeat). Configure Logstash to ingest logs from various sources.

Logstash Input for Nginx/Apache Access & Error Logs:

# /etc/logstash/conf.d/nginx.conf
input {
  file {
    path => "/var/log/nginx/*.log"
    start_position => "beginning"
    sincedb_path => "/dev/null" # Use Filebeat's sincedb if using Filebeat
    type => "nginx"
  }
}

Logstash Input for MongoDB Logs:

# /etc/logstash/conf.d/mongodb.conf
input {
  file {
    path => "/var/log/mongodb/mongod.log"
    start_position => "beginning"
    sincedb_path => "/dev/null" # Use Filebeat's sincedb if using Filebeat
    type => "mongodb"
  }
}

Logstash Filter for Nginx (Grok Pattern):

filter {
  if [type] == "nginx" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
    # Add GeoIP lookup for client IP if desired
    # geoip { source => "clientip" }
  }
}

Logstash Filter for MongoDB:

filter {
  if [type] == "mongodb" {
    # Basic grok for common MongoDB log lines
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} %{GREEDYDATA:message}" }
    }
    # More specific patterns for different MongoDB log message types can be added here
  }
}

Logstash Output to Elasticsearch:

output {
  elasticsearch {
    hosts => ["http://elasticsearch-host:9200"]
    index => "%{type}-%{+YYYY.MM.dd}"
  }
}

Kibana for Analysis and Alerting

In Kibana, create dashboards to visualize key log data. For example:

Nginx 5xx error rates over time.
MongoDB slow query logs (if configured to log).
Application error counts.
Droplet resource metrics (if using Metricbeat).

Use Kibana's alerting features (or integrate with Elasticsearch Watcher/Alerting) to trigger notifications based on log patterns, such as repeated application errors, specific MongoDB error messages, or unusual Nginx status codes.

Conclusion: A Multi-Layered Approach

Effective server monitoring for a critical WooCommerce application on DigitalOcean is not a single tool or technique. It requires a multi-layered strategy encompassing:

Database Health: Proactive checks on MongoDB replica set status, replication lag, and resource utilization.
Application Performance: Deep insights into WooCommerce request latency, errors, and dependencies using APM tools like New Relic.
Infrastructure Health: Monitoring Droplet resources (CPU, RAM, Disk, Network) and Load Balancer health.
Log Aggregation: Centralized collection and analysis of logs from all components to quickly diagnose issues.

By implementing these practices, you move from reactive firefighting to proactive system management, ensuring the stability and performance of your WooCommerce store.