Server Monitoring Best Practices: Keeping Your PHP App and Redis Clusters Alive on Google Cloud

Proactive PHP Application Health Checks with Cloud Monitoring

Effective monitoring of PHP applications on Google Cloud Platform (GCP) goes beyond basic uptime checks. We need to instrument our applications to expose internal health metrics that Cloud Monitoring can ingest and alert on. This involves creating custom metrics and leveraging the Cloud Monitoring API or the Ops Agent.

For a typical PHP application, critical health indicators include:

Request latency (average, p95, p99)
Error rates (HTTP 5xx, application-level exceptions)
Database connection pool status
Cache hit/miss ratios
Background job queue lengths

Implementing Custom Metrics in PHP

The most robust way to expose custom metrics is by using the Cloud Monitoring client libraries. However, for simpler scenarios or when direct API access is preferred, we can expose metrics via a dedicated health check endpoint that returns a JSON payload. This payload can then be scraped by the Ops Agent or a custom collector.

Let’s consider a scenario where we want to track the number of active database connections and the count of pending background jobs. We’ll create a simple PHP script that exposes this data.

Health Check Endpoint Example

Assume you have a mechanism to track active database connections (e.g., through your PDO or mysqli connection pool) and a queue for background jobs (e.g., Redis or a dedicated queue system).

`/healthz.php`

<?php
// Assume these functions are implemented elsewhere in your application
// to retrieve actual metrics.
function get_active_db_connections() {
    // Replace with actual logic to get connection count
    // e.g., querying connection pool status or a dummy value
    return rand(5, 20);
}

function get_pending_background_jobs() {
    // Replace with actual logic to get pending job count from Redis/queue
    // e.g., using Redis `LLEN` command
    // For demonstration, returning a random value
    return rand(0, 50);
}

// Set content type to JSON
header('Content-Type: application/json');

// Prepare the metrics payload
$metrics = [
    'active_db_connections' => get_active_db_connections(),
    'pending_background_jobs' => get_pending_background_jobs(),
    'timestamp' => (new DateTime('now', new DateTimeZone('UTC')))->format(DateTime::ISO8601),
    'status' => 'ok' // Basic status indicator
];

// Output the JSON
echo json_encode($metrics);
exit;
?>

Configuring the Ops Agent for Metric Collection

The Ops Agent is the recommended way to collect logs and metrics from your Compute Engine instances and GKE nodes. We’ll configure it to scrape our custom health endpoint.

Ops Agent Configuration (`/etc/google-cloud-ops-agent/config.yaml`)

We’ll use the metrics receiver to scrape the JSON endpoint and the google_cloud_monitoring transmitter to send data to Cloud Monitoring. Ensure your agent is installed and running.

metrics:
  # Define receivers for scraping metrics.
  receivers:
    # Receiver for our custom PHP health check endpoint.
    php_health_check:
      type: http
      endpoint: "http://localhost:80/healthz.php" # Adjust port if your web server is different
      interval: "60s" # Scrape every 60 seconds
      # Define how to parse the JSON response into metrics.
      # The 'metrics' field in the JSON will be treated as a map of metrics.
      # The 'timestamp' field will be used for the metric timestamp.
      # The 'status' field can be used for a health status metric.
      parse_json:
        metrics_path: "$.*" # Capture all key-value pairs in the root JSON object
        timestamp_path: "$.timestamp"
        status_path: "$.status" # Optional: can be used to derive a health metric

  # Define transmitters to send collected data to.
  transmitters:
    # Transmit metrics to Google Cloud Monitoring.
    google_cloud_monitoring:
      type: google_cloud_monitoring
      # Optional: specify a project if not using the agent's default project.
      # project_id: "your-gcp-project-id"

  # Define logging pipelines.
  logs:
    # ... (your log collection configuration)

  # Define metrics pipelines.
  # This pipeline connects the receiver to the transmitter.
  metrics_pipelines:
    # Pipeline for our custom PHP metrics.
    php_metrics_pipeline:
      receivers:
        - php_health_check
      transmitters:
        - google_cloud_monitoring

After updating the configuration, restart the Ops Agent:

sudo systemctl restart google-cloud-ops-agent

Monitoring Redis Clusters with Cloud Monitoring

Redis clusters, whether managed (Memorystore for Redis) or self-hosted on Compute Engine/GKE, require specific monitoring. Key metrics include:

Memory usage (used_memory, used_memory_rss)
CPU utilization
Network traffic (bytes_in, bytes_out)
Cache hit/miss ratio (requires custom instrumentation or specific Redis commands)
Latency of Redis commands
Number of connected clients
Replication status (for master/replica setups)

Memorystore for Redis Metrics

Memorystore for Redis automatically exposes a comprehensive set of metrics to Cloud Monitoring. You can view these directly in the Cloud Console under “Monitoring” > “Metrics Explorer”. Common metrics include:

redis.googleapis.com/network/received_bytes_count
redis.googleapis.com/network/transmitted_bytes_count
redis.googleapis.com/memory/used_memory
redis.googleapis.com/cpu/usage
redis.googleapis.com/clients/count

For Memorystore, the primary focus is on setting up appropriate alerting policies based on these built-in metrics.

Monitoring Self-Hosted Redis Clusters

For self-hosted Redis, we can leverage the Ops Agent again, this time using the built-in Redis exporter or by querying Redis directly via `redis-cli` and exposing those metrics.

Option 1: Using the Ops Agent’s Redis Receiver

The Ops Agent has a built-in receiver for Redis that can scrape metrics directly from a running Redis instance. This is often the simplest approach for self-hosted Redis.

Ops Agent Configuration Snippet (`/etc/google-cloud-ops-agent/config.yaml`)

metrics:
  receivers:
    redis_metrics:
      type: redis
      # Specify the endpoint for your Redis instance.
      # For a single instance:
      # endpoint: "localhost:6379"
      # For a Redis cluster (requires multiple endpoints or a proxy):
      endpoints:
        - "redis-node-1:6379"
        - "redis-node-2:6379"
        - "redis-node-3:6379"
      interval: "30s" # Scrape every 30 seconds
      # Optional: authentication
      # password: "your_redis_password"

  transmitters:
    google_cloud_monitoring:
      type: google_cloud_monitoring

  metrics_pipelines:
    redis_pipeline:
      receivers:
        - redis_metrics
      transmitters:
        - google_cloud_monitoring

Remember to restart the Ops Agent after applying this configuration.

Option 2: Custom Script with `redis-cli` and Ops Agent

If the built-in receiver doesn’t cover specific metrics you need, or for more granular control, you can write a custom script that queries Redis and exposes metrics in a format the Ops Agent can scrape (e.g., JSON endpoint as shown for PHP).

Example Custom Script (`/opt/redis_metrics_exporter.py`)

import redis
import json
from datetime import datetime, timezone

# Configuration
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
# REDIS_PASSWORD = 'your_redis_password' # Uncomment if authentication is needed

def get_redis_metrics():
    try:
        # Connect to Redis
        r = redis.StrictRedis(host=REDIS_HOST, port=REDIS_PORT, decode_responses=True) #, password=REDIS_PASSWORD)
        r.ping() # Check connection

        metrics = {}
        # Basic metrics
        metrics['used_memory_bytes'] = int(r.info('memory')['used_memory'])
        metrics['used_memory_rss_bytes'] = int(r.info('memory')['used_memory_rss'])
        metrics['connected_clients'] = int(r.info('clients')['connected_clients'])
        metrics['instantaneous_ops_per_sec'] = int(r.info('stats')['instantaneous_ops_per_sec'])
        metrics['keyspace_hits'] = int(r.info('keyspace')['db0']['keyspace_hits']) # Assuming db0
        metrics['keyspace_misses'] = int(r.info('keyspace')['db0']['keyspace_misses']) # Assuming db0

        # Calculate hit ratio
        hits = metrics['keyspace_hits']
        misses = metrics['keyspace_misses']
        if (hits + misses) > 0:
            metrics['keyspace_hit_ratio'] = (hits / (hits + misses)) * 100
        else:
            metrics['keyspace_hit_ratio'] = 0.0

        # Add timestamp
        metrics['timestamp'] = datetime.now(timezone.utc).isoformat()
        metrics['status'] = 'ok'

        return metrics

    except redis.exceptions.ConnectionError as e:
        return {'status': 'error', 'message': str(e), 'timestamp': datetime.now(timezone.utc).isoformat()}
    except Exception as e:
        return {'status': 'error', 'message': str(e), 'timestamp': datetime.now(timezone.utc).isoformat()}

if __name__ == "__main__":
    # This script would typically be run by a web server (e.g., PHP's built-in server for testing,
    # or integrated into a Python web framework) to serve the JSON endpoint.
    # For simplicity, we'll just print the JSON here.
    # In a production setup, you'd integrate this into a Flask/Django app or use a dedicated exporter.

    # Example of serving via Flask (requires Flask installed: pip install Flask)
    from flask import Flask, jsonify
    app = Flask(__name__)

    @app.route('/redis_healthz')
    def healthz():
        metrics = get_redis_metrics()
        return jsonify(metrics)

    # To run this script directly for testing:
    # python /opt/redis_metrics_exporter.py
    # Then access http://localhost:5000/redis_healthz
    # For production, use a proper WSGI server like Gunicorn.
    app.run(host='0.0.0.0', port=5000)

You would then configure the Ops Agent’s http receiver to scrape this Python script’s endpoint (e.g., http://localhost:5000/redis_healthz) and use the same parse_json configuration as shown for the PHP health check.

Alerting Strategies in Cloud Monitoring

Once metrics are flowing into Cloud Monitoring, the next critical step is setting up alerts. Alerts should be actionable and tuned to prevent false positives while catching genuine issues early.

PHP Application Alerting Examples

Based on the custom metrics we’ve set up:

High Error Rate: Alert if the rate of HTTP 5xx errors (collected automatically by Cloud Monitoring for App Engine, GKE, or Compute Engine) exceeds a threshold (e.g., > 5% over 5 minutes).
High Latency: Alert if p95 or p99 request latency exceeds a threshold (e.g., > 2 seconds for 10 minutes).
Database Connection Saturation: Alert if active_db_connections (from our custom metric) is consistently high (e.g., > 90% of max pool size for 15 minutes).
Background Job Backlog: Alert if pending_background_jobs exceeds a critical threshold (e.g., > 1000 jobs for 30 minutes), indicating a potential processing bottleneck.

Redis Cluster Alerting Examples

For both Memorystore and self-hosted Redis:

High Memory Usage: Alert if redis.googleapis.com/memory/used_memory (or custom equivalent) exceeds 85% of the allocated memory for 20 minutes.
High CPU Usage: Alert if redis.googleapis.com/cpu/usage (or custom equivalent) exceeds 75% for 15 minutes.
Low Cache Hit Ratio: Alert if the calculated keyspace_hit_ratio (from custom script) drops below 70% for 10 minutes.
Excessive Clients: Alert if redis.googleapis.com/clients/count (or custom equivalent) exceeds a predefined limit (e.g., > 10000 clients).
Replication Lag: For self-hosted master/replica setups, monitor replication lag and alert if it exceeds a few seconds.

Advanced Considerations: Distributed Tracing and SLOs

For complex microservice architectures, relying solely on host-level metrics might not be sufficient. Consider:

Distributed Tracing: Implement OpenTelemetry or use GCP’s operations suite (Cloud Trace) to trace requests across multiple services. This helps pinpoint latency bottlenecks in distributed systems.
Service Level Objectives (SLOs): Define SLOs for critical user journeys (e.g., “99.9% of login requests complete within 500ms”). Cloud Monitoring can help track Service Level Indicators (SLIs) that feed into SLO compliance.

By combining granular application-level metrics, robust infrastructure monitoring for Redis, and well-defined alerting policies, you can build a resilient and observable PHP application environment on Google Cloud.

Server Monitoring Best Practices: Keeping Your PHP App and Redis Clusters Alive on Google Cloud

Proactive PHP Application Health Checks with Cloud Monitoring

Implementing Custom Metrics in PHP

Health Check Endpoint Example

/healthz.php

Configuring the Ops Agent for Metric Collection

Ops Agent Configuration (/etc/google-cloud-ops-agent/config.yaml)

Monitoring Redis Clusters with Cloud Monitoring

Memorystore for Redis Metrics

Monitoring Self-Hosted Redis Clusters

Option 1: Using the Ops Agent’s Redis Receiver

Ops Agent Configuration Snippet (/etc/google-cloud-ops-agent/config.yaml)

Option 2: Custom Script with `redis-cli` and Ops Agent

Example Custom Script (/opt/redis_metrics_exporter.py)

Alerting Strategies in Cloud Monitoring

PHP Application Alerting Examples

Redis Cluster Alerting Examples

Advanced Considerations: Distributed Tracing and SLOs

Recent Posts

Top Categories

Our Products

Our Services

`/healthz.php`

Ops Agent Configuration (`/etc/google-cloud-ops-agent/config.yaml`)

Ops Agent Configuration Snippet (`/etc/google-cloud-ops-agent/config.yaml`)

Example Custom Script (`/opt/redis_metrics_exporter.py`)