Server Monitoring Best Practices: Keeping Your Ruby App and PostgreSQL Clusters Alive on Google Cloud

Proactive PostgreSQL Health Checks with pg_cron

Maintaining the health of a PostgreSQL cluster, especially in a distributed cloud environment like Google Cloud, requires more than just reactive alerts. Proactive, automated checks are crucial for identifying and mitigating potential issues before they impact your Ruby application. We’ll leverage pg_cron, a PostgreSQL extension, to schedule routine maintenance and health checks directly within the database.

First, ensure pg_cron is installed and enabled on your PostgreSQL instances. This typically involves modifying your postgresql.conf file and restarting the PostgreSQL service. On Google Cloud SQL, this is often managed via instance flags.

Configuring pg_cron for PostgreSQL Maintenance

Once installed, you can schedule jobs using SQL. A common and vital task is to regularly run VACUUM VERBOSE on your databases. This process reclaims storage occupied by dead tuples and can prevent table bloat, which directly impacts query performance for your Ruby app.

Scheduling a Daily VACUUM

The following SQL command schedules a daily VACUUM VERBOSE for all databases on the instance. It’s important to run this during off-peak hours to minimize performance impact.

-- Connect to a database that has pg_cron installed (e.g., postgres)
-- Schedule a job to run VACUUM VERBOSE on all databases daily at 3 AM
SELECT cron.schedule(
    'daily-vacuum',
    '0 3 * * *', -- Cron syntax: Minute Hour DayOfMonth Month DayOfWeek
    $$
        DO $$
        BEGIN
            FOR db_name IN (SELECT datname FROM pg_database WHERE datistemplate = false) LOOP
                RAISE NOTICE 'Running VACUUM VERBOSE on database: %', db_name;
                EXECUTE format('VACUUM VERBOSE %I', db_name);
            END LOOP;
        END;
        $$;
    $$
);

To monitor the execution of these cron jobs, you can query the cron.job and cron.job_run_details tables.

-- View scheduled jobs
SELECT * FROM cron.job;

-- View recent job run history
SELECT * FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;

For more advanced scenarios, consider scheduling ANALYZE to update table statistics, which is critical for the query planner’s efficiency. You might also want to schedule checks for replication lag if you’re using read replicas.

Monitoring Ruby Application Performance with Prometheus and Grafana

Your Ruby application’s performance is directly tied to the health of your PostgreSQL cluster and its own resource utilization. We’ll set up Prometheus to scrape metrics from your Ruby application and Grafana for visualization and alerting.

Instrumenting Your Ruby Application

The prometheus-client-ruby gem is the standard for instrumenting Ruby applications. You’ll need to add it to your Gemfile and initialize it.

# Gemfile
gem 'prometheus-client-ruby'

# config/initializers/prometheus.rb (for Rails applications)
require 'prometheus/client'

Prometheus::Client.configure do |config|
  config.logger = Rails.logger
end

# Define custom metrics
$metrics = {}
$metrics[:http_requests_total] = Prometheus::Client::Counter.new(
  :http_requests_total,
  'Total number of HTTP requests processed'
)
$metrics[:http_request_duration_seconds] = Prometheus::Client::Histogram.new(
  :http_request_duration_seconds,
  'HTTP request latency in seconds'
)
$metrics[:db_query_duration_seconds] = Prometheus::Client::Histogram.new(
  :db_query_duration_seconds,
  'Database query latency in seconds'
)

# Register metrics with the default registry
Prometheus::Client.default.register($metrics[:http_requests_total])
Prometheus::Client.default.register($metrics[:http_request_duration_seconds])
Prometheus::Client.default.register($metrics[:db_query_duration_seconds])

# Expose metrics endpoint (e.g., /metrics)
# In Rails, you can use a Rack middleware or a dedicated controller action.
# For simplicity, let's assume a Rack middleware is configured in config/application.rb
# or config/routes.rb for non-Rails apps.
# Example for Rails:
# config.middleware.use Prometheus::Client::Rack::Metrics, path: '/metrics'

You’ll also want to instrument your database queries. This can be done by monkey-patching your ORM (like ActiveRecord) or by manually recording durations around your query execution.

# Example for ActiveRecord (in an initializer or a custom instrumentation file)
module ActiveRecord
  module ConnectionAdapters
    module PostgreSQLAdapter
      def execute(sql, name = nil, parameters = nil)
        start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
        result = super
        duration = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time
        $metrics[:db_query_duration_seconds].observe({ query: name || 'unknown' }, duration)
        result
      rescue => e
        duration = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time
        $metrics[:db_query_duration_seconds].observe({ query: name || 'unknown', error: e.class.to_s }, duration)
        raise e
      end
    end
  end
end

Configuring Prometheus Scraper

On your Google Cloud Compute Engine instances running your Ruby application, configure Prometheus to scrape the /metrics endpoint. If you’re using Kubernetes Engine (GKE), you’ll use service discovery.

# prometheus.yml
scrape_configs:
  - job_name: 'ruby_app'
    static_configs:
      - targets: ['YOUR_APP_INSTANCE_IP:9090'] # Replace with your app's IP and port where metrics are exposed
        labels:
          environment: 'production'
          app: 'my-ruby-app'
  - job_name: 'postgres'
    static_configs:
      - targets: ['YOUR_POSTGRES_INSTANCE_IP:9187'] # Assuming you're using pg_exporter on port 9187
        labels:
          environment: 'production'
          cluster: 'main-pg-cluster'

For PostgreSQL, you’ll typically use the postgres_exporter. Ensure it’s installed and configured to connect to your PostgreSQL instances. The exporter will expose metrics on a specific port (default is 9187).

Visualizing and Alerting with Grafana

Grafana is your go-to for visualizing the metrics collected by Prometheus and for setting up alerts. Connect Grafana to your Prometheus data source.

Key Dashboards and Alerts for Ruby/PostgreSQL

Here are some essential metrics and corresponding Grafana queries/alerts:

Application Error Rate: Monitor the rate of exceptions.

Grafana Query (PromQL):

rate(http_requests_total{status=~"5..", environment="production"}[5m])

Alert: Trigger if the error rate exceeds a threshold (e.g., 1% of total requests) for more than 5 minutes.

Request Latency: Track the duration of HTTP requests.

Grafana Query (PromQL):

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, environment, app))

Alert: Notify if the 95th percentile latency exceeds a defined SLO (e.g., 500ms) for a sustained period.

Database Query Latency: Identify slow database queries.

Grafana Query (PromQL):

histogram_quantile(0.95, sum(rate(db_query_duration_seconds_bucket[5m])) by (le, query, environment))

Alert: Alert on high 95th percentile query latency, especially for critical queries identified by the query label.

PostgreSQL Connection Count: Monitor active connections to prevent exhaustion.

Grafana Query (PromQL – assuming pg_exporter):

pg_stat_activity_count{environment="production"}

Alert: Trigger if the connection count approaches the configured max_connections limit.

PostgreSQL Replication Lag: Crucial for high availability and disaster recovery.

Grafana Query (PromQL – assuming pg_exporter):

pg_replication_lag_seconds{environment="production"}

Alert: Notify immediately if replication lag exceeds a few seconds (depending on your RPO/RTO requirements).

Google Cloud Operations Suite (formerly Stackdriver) Integration

While Prometheus and Grafana provide deep visibility into your application and database metrics, Google Cloud Operations Suite offers centralized logging and monitoring for your GCP resources. It’s essential for capturing system-level metrics and logs from your Compute Engine instances and Cloud SQL instances.

Leveraging Cloud Logging and Monitoring

Ensure the Cloud Logging and Cloud Monitoring agents are installed and running on your Compute Engine instances. For Cloud SQL, logs and metrics are automatically collected.

# On Compute Engine instances, verify agent status
sudo systemctl status google-fluentd
sudo systemctl status google-cloud-monitoring-agent

You can create custom metrics in Cloud Monitoring based on log entries. For example, if your Ruby application logs specific error messages, you can create a metric to count these occurrences.

Creating Log-Based Metrics in Cloud Monitoring

Navigate to Cloud Monitoring > Logs-based Metrics in the Google Cloud Console. Create a new metric:

Metric Type: Counter
Log Filter: Example: resource.type="gce_instance" AND jsonPayload.message:"FATAL" AND jsonPayload.database:"your_db_name" (for PostgreSQL FATAL errors)
Metric Name: e.g., postgresql_fatal_errors

This allows you to correlate application-level metrics from Prometheus with infrastructure-level metrics and logs from Google Cloud Operations Suite, providing a holistic view of your system’s health.

High Availability and Disaster Recovery Considerations

Monitoring is a cornerstone of HA/DR. Ensure your monitoring setup itself is resilient. For critical alerts, consider using multiple notification channels (e.g., PagerDuty, Slack, email).

For PostgreSQL, leverage Cloud SQL’s built-in HA features (regional replicas) and configure automated backups. Monitor replication lag closely, as discussed earlier, to ensure failover readiness. Your Ruby application should be designed to handle temporary database unavailability gracefully, perhaps with retry mechanisms and circuit breakers.

Regularly test your failover procedures and disaster recovery plans. Monitoring provides the data to validate that these processes are working as expected.

Server Monitoring Best Practices: Keeping Your Ruby App and PostgreSQL Clusters Alive on Google Cloud

Proactive PostgreSQL Health Checks with pg_cron

Configuring pg_cron for PostgreSQL Maintenance

Scheduling a Daily VACUUM

Monitoring Ruby Application Performance with Prometheus and Grafana

Instrumenting Your Ruby Application

Configuring Prometheus Scraper

Visualizing and Alerting with Grafana

Key Dashboards and Alerts for Ruby/PostgreSQL

Google Cloud Operations Suite (formerly Stackdriver) Integration

Leveraging Cloud Logging and Monitoring

Creating Log-Based Metrics in Cloud Monitoring

High Availability and Disaster Recovery Considerations

Recent Posts

Top Categories

Our Products

Our Services