Server Monitoring Best Practices: Keeping Your PHP App and DynamoDB Clusters Alive on DigitalOcean
Establishing a Robust Monitoring Foundation
Effective server monitoring for a PHP application backed by DynamoDB on DigitalOcean hinges on a multi-layered approach. We need to monitor the infrastructure (Droplets, load balancers), the application itself (PHP-FPM, web server), and the critical data store (DynamoDB). This isn’t about setting up a single dashboard; it’s about establishing a system that provides actionable insights and alerts before issues impact users.
Monitoring DigitalOcean Droplets and Resources
DigitalOcean provides basic metrics through its control panel, but for deeper insights and automated alerting, we’ll leverage a combination of `node_exporter` for system-level metrics and a centralized time-series database like Prometheus. This setup allows us to track CPU, memory, disk I/O, and network traffic with granular detail.
Deploying Node Exporter
On each Droplet hosting your PHP application or any supporting services, install and configure `node_exporter`. This can be done via a simple download and systemd service.
Installation and Service Setup (Ubuntu/Debian)
First, download the latest release from the Prometheus GitHub repository. Replace `X.Y.Z` with the current version.
Downloading Node Exporter
Run these commands on each relevant Droplet:
Shell Commands
wget https://github.com/prometheus/node_exporter/releases/download/vX.Y.Z/node_exporter-X.Y.Z.linux-amd64.tar.gz tar xvfz node_exporter-X.Y.Z.linux-amd64.tar.gz sudo mv node_exporter-X.Y.Z.linux-amd64/node_exporter /usr/local/bin/ sudo rm -rf node_exporter-X.Y.Z.linux-amd64*
Creating a Systemd Service
Create a systemd service file to manage `node_exporter`.
Systemd Service File
sudo nano /etc/systemd/system/node_exporter.service
Paste the following content into the file:
Node Exporter Systemd Unit
[Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=nobody Group=nogroup Type=simple ExecStart=/usr/local/bin/node_exporter [Install] WantedBy=multi-user.target
Enabling and Starting the Service
Enable and start the service, then check its status.
Shell Commands
sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter sudo systemctl status node_exporter
Verify that `node_exporter` is running and accessible on port 9100. You should be able to access http://YOUR_DROPLET_IP:9100/metrics from a machine that can reach your Droplet.
Configuring Prometheus to Scrape Node Exporter
Your Prometheus server needs to be configured to scrape these metrics. Assuming you have a Prometheus instance running (e.g., on a dedicated Droplet or within a Kubernetes cluster), modify its `prometheus.yml` configuration.
Prometheus Configuration Snippet
scrape_configs:
- job_name: 'node_exporter'
static_configs:
- targets: ['DROPLET_IP_1:9100', 'DROPLET_IP_2:9100', 'DROPLET_IP_3:9100'] # Add all your Droplet IPs
# If using service discovery (e.g., Consul, Kubernetes), this section would differ.
After updating prometheus.yml, reload the Prometheus configuration (usually via a `SIGHUP` signal or by restarting the Prometheus service).
Application-Level Monitoring: PHP and Web Server
Monitoring the application layer is crucial for understanding user experience and identifying bottlenecks within your PHP code or web server configuration. We’ll focus on PHP-FPM, Nginx/Apache, and application-specific metrics.
PHP-FPM Monitoring
PHP-FPM exposes its status through a dedicated status page. This requires enabling the `status_path` in your PHP-FPM pool configuration and making it accessible.
Enabling PHP-FPM Status
Edit your PHP-FPM pool configuration file (e.g., /etc/php/8.1/fpm/pool.d/www.conf or similar). Uncomment and set the following directives:
PHP-FPM Pool Configuration
pm.status_path = /fpm-status ping.path = /fpm-ping ping.response = pong
Restart PHP-FPM for changes to take effect.
Shell Command
sudo systemctl restart php8.1-fpm # Adjust version as needed
Exposing PHP-FPM Status via Web Server
You need to configure your web server (Nginx in this example) to proxy requests to the PHP-FPM status page. This is typically done within your site’s Nginx configuration file.
Nginx Configuration Snippet
location ~ ^/(fpm-status|fpm-ping)$ {
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_pass unix:/run/php/php8.1-fpm.sock; # Adjust path if necessary
internal; # Restrict direct access
}
Reload Nginx after applying changes.
Shell Command
sudo systemctl reload nginx
You can now access http://YOUR_APP_DOMAIN/fpm-status to see the PHP-FPM status page. For Prometheus integration, you’ll use a dedicated exporter like php-fpm_exporter, which can scrape this status page and expose metrics in Prometheus format.
Web Server (Nginx/Apache) Metrics
Both Nginx and Apache can expose metrics. For Nginx, the nginx-prometheus-exporter is a common choice. For Apache, apache_exporter can be used.
Nginx Prometheus Exporter
Install and configure the exporter. This often involves running a Docker container or a standalone binary. The exporter typically needs access to Nginx’s stub_status module or its log files.
Example Docker Compose for Nginx Exporter
version: '3'
services:
nginx-exporter:
image: nginx/nginx-prometheus-exporter:latest
ports:
- "9113:9113"
volumes:
- /etc/nginx/nginx.conf:/etc/nginx/nginx.conf:ro # Mount Nginx config for log parsing
command:
- "--nginx.scrape-logs"
- "--nginx.log-format-name=combined" # Or your custom log format name
- "--nginx.accesslog=/var/log/nginx/access.log" # Path to Nginx access log
- "--web.listen-address=:9113"
restart: always
Ensure your Nginx configuration has stub_status enabled if you’re not relying solely on log parsing.
Nginx Stub Status Configuration
http {
# ... other http settings ...
server {
listen 8080; # Or another port
location /nginx_status {
stub_status;
allow 127.0.0.1; # Restrict access
deny all;
}
}
# ...
}
Add the exporter’s endpoint (e.g., YOUR_DROPLET_IP:9113) to your Prometheus configuration.
Application-Specific Metrics (PHP)
For application-specific metrics, such as request latency, error rates, or custom business logic counters, instrument your PHP code. Libraries like Prometheus client for PHP are invaluable.
Example: Instrumenting a PHP Endpoint
Install the client library via Composer:
Shell Command
composer require prometheus/client-php
Create a metrics endpoint in your PHP application (e.g., /metrics.php) that exposes the collected metrics.
PHP Metrics Endpoint Example
<?php
require __DIR__ . '/vendor/autoload.php';
use Prometheus\Storage\InMemory;
use Prometheus\Render\RenderText;
use Prometheus\Registry;
use Prometheus\Counter;
// Initialize registry and storage
$adapter = new InMemory();
$registry = new Registry($adapter);
// Define a counter for successful API requests
$counter = $registry->registerCounter(
'my_app_api_requests_total',
'Total number of API requests',
['method', 'endpoint']
);
// --- In your API route handler ---
// Example: When a GET request to /users is successful
// $counter->incBy(1, ['GET', '/users']);
// --- Expose metrics ---
header('Content-Type: text/plain');
$renderer = new RenderText();
echo $renderer->render($registry->getMetricFamilySamples());
?>
You’ll need to manually increment these counters within your application logic. Then, configure Prometheus to scrape this endpoint (e.g., http://YOUR_APP_DROPLET_IP/metrics.php).
DynamoDB Monitoring Strategies
Monitoring DynamoDB is critical for performance and cost management. AWS CloudWatch is the primary source for DynamoDB metrics. We’ll focus on key metrics and how to integrate them into our monitoring system.
Key DynamoDB Metrics to Monitor
- ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits: Essential for understanding throughput usage and identifying potential throttling.
- ProvisionedReadCapacityUnits and ProvisionedWriteCapacityUnits: For tables in provisioned mode, this shows your allocated capacity.
- ThrottledRequests: Indicates requests that were rejected due to exceeding provisioned capacity.
- SuccessfulRequestLatency: Average latency for successful requests. High latency can signal performance issues.
- SystemErrors: Number of internal server errors.
- ReturnedItemCount: Number of items returned by queries/scans.
- ItemCount and TableSizeBytes: For understanding table size and growth.
Integrating DynamoDB Metrics with Prometheus
Directly scraping DynamoDB metrics into Prometheus isn’t standard. The typical approach involves using a service that can query CloudWatch and expose metrics in Prometheus format. The cloudwatch_exporter (part of the Prometheus community) is a popular choice.
Setting up CloudWatch Exporter
This exporter runs as a separate service, configured to query specific AWS CloudWatch metrics and expose them via an HTTP endpoint.
Configuration File (config.yml)
discovery:
region: us-east-1 # Your AWS region
metrics:
- name: "AWS/DynamoDB"
statistics:
- "Sum"
- "Average"
period: 300 # 5 minutes
length: 1
# Filter by table name if you have many tables
# dimensions:
# - name: "TableName"
# value: "YourDynamoDBTableName"
# Or monitor all tables
dimensions:
- name: "TableName"
value: "*"
# Define which metrics to expose and how to name them
metrics:
- aws: "ConsumedReadCapacityUnits"
name: "aws_dynamodb_consumed_read_capacity_units"
type: "counter"
# You can add labels here if you have dimensions
# labels:
# - "TableName"
- aws: "ConsumedWriteCapacityUnits"
name: "aws_dynamodb_consumed_write_capacity_units"
type: "counter"
- aws: "ThrottledRequests"
name: "aws_dynamodb_throttled_requests"
type: "counter"
- aws: "SuccessfulRequestLatency"
name: "aws_dynamodb_successful_request_latency"
type: "gauge"
# For latency, you might want specific percentiles if available
# statistics:
# - "Average"
# - "Maximum"
# - "p90" # If available via CloudWatch
- aws: "SystemErrors"
name: "aws_dynamodb_system_errors"
type: "counter"
- aws: "ItemCount"
name: "aws_dynamodb_item_count"
type: "gauge"
- aws: "TableSizeBytes"
name: "aws_dynamodb_table_size_bytes"
type: "gauge"
Running CloudWatch Exporter
You can run this exporter using Docker or as a standalone binary. Ensure the AWS credentials used have permissions to read CloudWatch metrics for DynamoDB.
Example Docker Command
docker run -d \ -p 9301:9301 \ --name cloudwatch-exporter \ -v /path/to/your/config.yml:/config.yml \ prom/cloudwatch-exporter --config.file=/config.yml \ --aws.region=us-east-1 \ --web.listen-address=":9301" \ --log.level=info
Add the exporter’s endpoint (e.g., YOUR_MONITORING_DROPLET_IP:9301) to your Prometheus configuration.
Alerting with Alertmanager
Metrics are only useful if they trigger alerts when thresholds are breached. Prometheus integrates with Alertmanager for sophisticated alerting rules and routing.
Example Alerting Rules
Define alerting rules in Prometheus (e.g., in a file like alerts.yml).
Prometheus Alerting Rules
- alert: HighCPUUsage
expr: node_cpu_seconds_total{mode="idle", job="node_exporter"} < 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 90% for the last 5 minutes on {{ $labels.instance }}."
- alert: HighDynamoDBThrottling
expr: sum(aws_dynamodb_throttled_requests{job="cloudwatch_exporter"}) by (TableName) > 0
for: 1m
labels:
severity: warning
annotations:
summary: "DynamoDB throttling detected for {{ $labels.TableName }}"
description: "Throttled requests detected for table {{ $labels.TableName }} for the last minute."
- alert: PHPFPMHighProcessCount
expr: php_fpm_process_count{job="php_fpm_exporter"} > 20 # Adjust threshold based on your pool config
for: 2m
labels:
severity: warning
annotations:
summary: "High PHP-FPM process count on {{ $labels.instance }}"
description: "PHP-FPM is running with {{ $value }} processes, potentially indicating a bottleneck."
- alert: NginxHighErrorRate
expr: rate(nginx_http_requests_total{job="nginx_exporter", status=~"5.."}[5m]) > 0.1 # More than 10% 5xx errors in 5 mins
for: 1m
labels:
severity: critical
annotations:
summary: "High Nginx 5xx error rate on {{ $labels.instance }}"
description: "Nginx is experiencing a high rate of 5xx errors ({{ $value | printf "%.2f" }} req/sec)."
Configure Prometheus to load these rules and ensure Alertmanager is set up to receive alerts from Prometheus and route them to your desired notification channels (Slack, PagerDuty, email, etc.).
Log Aggregation and Analysis
While metrics tell you *what* is happening, logs tell you *why*. A centralized logging system is indispensable for debugging and incident response.
Choosing a Logging Solution
Options include ELK Stack (Elasticsearch, Logstash, Kibana), Grafana Loki, or cloud-native solutions. For this setup, we’ll consider using Fluentd or Fluent Bit as log forwarders from your Droplets to a central logging store.
Forwarding Logs with Fluent Bit
Fluent Bit is lightweight and efficient. Install it on your application Droplets and configure it to tail application logs (PHP error logs, Nginx access/error logs) and forward them.
Fluent Bit Configuration Example (fluent-bit.conf)
[SERVICE]
Flush 5
Daemon Off
Log_Level info
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 127.0.0.1
HTTP_Port 2020
[INPUT]
Name tail
Path /var/log/nginx/error.log
Tag nginx.error
Parser nginx
[INPUT]
Name tail
Path /var/log/php/error.log # Adjust path to your PHP error log
Tag php.error
Parser php_error
[OUTPUT]
Name forward
Host YOUR_LOGGING_AGGREGATOR_HOST
Port 24224
Retry_Limit False
Parsers Configuration (parsers.conf)
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[\d\.]+) - (?<user>\S+) \[(?<time>[^\]]+)\] "(?<method>\S+) (?<path>\S+) (?<protocol>\S+)" (?<status>\d+) (?<bytes>\d+) "(?<referer>[^\"]*)" "(?<user_agent>[^\"]*)"
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name php_error
Format regex
Regex ^(?<time>[^ ]+) \[(?<level>[^\]]+)\] (?<message>.*)$
Time_Key time
Time_Format %Y-%m-%d %H:%M:%S
Configure your central logging aggregator (e.g., Elasticsearch, Loki) to receive logs from Fluent Bit. This allows you to search, filter, and visualize logs, correlating them with metrics for faster troubleshooting.
Conclusion: A Proactive Stance
Implementing comprehensive monitoring for your PHP application and DynamoDB on DigitalOcean requires a layered strategy. By combining infrastructure metrics, application performance monitoring, detailed web server insights, and robust data store observability, coupled with effective alerting and log aggregation, you move from reactive firefighting to proactive system management. This ensures higher availability, better performance, and a more stable user experience.