How We Audited a High-Traffic Ruby Enterprise Stack on OVH and Mitigated unsafe YAML loading allowing remote code execution

Deep Dive: Auditing a High-Traffic Ruby Enterprise Stack on OVH

This post details a critical security audit performed on a high-traffic Ruby on Rails enterprise application hosted on OVH. The primary objective was to identify and mitigate vulnerabilities, with a specific focus on unsafe deserialization patterns that could lead to Remote Code Execution (RCE).

Initial Reconnaissance and Stack Identification

The target environment was a complex Ruby on Rails monolith with several microservices, running on a mix of dedicated servers and OVH’s public cloud instances. Key components identified included:

Ruby on Rails (versions 4.x and 5.x)
PostgreSQL (v9.x)
Redis (v3.x)
Nginx (as a reverse proxy)
Sidekiq (for background job processing)
Custom-built Ruby gems
Various third-party gems with known vulnerabilities

The OVH infrastructure utilized a combination of bare-metal servers for core services and virtual machines for supporting components. Network segmentation was present but not strictly enforced at the application layer, posing a risk of lateral movement.

Focus Area: Unsafe YAML Deserialization

A common attack vector in Ruby applications is the unsafe loading of YAML data. The `YAML.load` method, when used with untrusted input, can execute arbitrary Ruby code. This is particularly dangerous when data is loaded from user-controlled sources, such as API requests, cookies, or even files uploaded by users.

Our audit specifically looked for instances where `YAML.load` was being used without proper sanitization or validation of the input source. This often occurs in older codebases or when developers are not fully aware of the security implications.

Identifying Vulnerable Code Patterns

We employed a combination of static analysis tools and manual code review to pinpoint potential vulnerabilities. Tools like Brakeman were invaluable for an initial scan, but manual inspection was crucial for understanding context and identifying subtle flaws.

A typical vulnerable pattern looks like this:

Example of Vulnerable Code

Consider a controller action that accepts a YAML payload:

# app/controllers/api/v1/data_controller.rb
class Api::V1::DataController < ApplicationController
  def create
    # WARNING: Unsafe YAML loading from request body
    data = YAML.load(request.body.read)
    process_data(data)
    render json: { status: "success" }
  end

  private

  def process_data(data)
    # ... business logic ...
  end
end

An attacker could craft a malicious YAML payload to achieve RCE. For instance, they could send a request with a `Content-Type: application/yaml` header and a body like this:

Malicious YAML Payload Example

--- !ruby/object:Process
# This payload executes `ls -la /` on the server
args:
  - -c
  - "ls -la / > /tmp/rce_output.txt"
set_id: true
uid: 0
gid: 0

When `YAML.load` processes this, it instantiates a `Process` object and executes the provided command. This is a classic exploit pattern.

Mitigation Strategy: Safe YAML Loading

The recommended and safest approach is to use `YAML.safe_load` (available in Psych, the default YAML parser in Ruby 2.0+). `YAML.safe_load` restricts the types of objects that can be deserialized, preventing the instantiation of arbitrary classes and the execution of malicious code.

Implementing Safe Loading

The vulnerable code snippet above should be refactored to use `YAML.safe_load`:

# app/controllers/api/v1/data_controller.rb
class Api::V1::DataController < ApplicationController
  def create
    # Safely load YAML data
    data = YAML.safe_load(request.body.read, symbolize_names: true) # Added symbolize_names for consistency
    process_data(data)
    render json: { status: "success" }
  rescue Psych::SyntaxError => e
    render json: { error: "Invalid YAML format: #{e.message}" }, status: :bad_request
  rescue StandardError => e
    render json: { error: "An unexpected error occurred: #{e.message}" }, status: :internal_server_error
  end

  private

  def process_data(data)
    # ... business logic ...
    # Ensure data is of expected types after safe loading
    unless data.is_a?(Hash) && data.key?(:user_id) && data.key?(:payload)
      raise ArgumentError, "Invalid data structure received"
    end
  end
end

Key improvements:

Replaced `YAML.load` with `YAML.safe_load`.
Added `symbolize_names: true` for consistent key handling (e.g., `:user_id` instead of `”user_id”`).
Included error handling for `Psych::SyntaxError` to gracefully manage malformed YAML.
Added a basic type check within `process_data` to ensure the deserialized data conforms to expected structures, further hardening against unexpected inputs.

Beyond YAML: Broader Security Audit Findings

While the YAML vulnerability was critical, the audit uncovered other areas requiring attention:

1. Outdated Dependencies

A significant number of gems were outdated, some with known CVEs. We used `bundle outdated` and then cross-referenced with security advisories.

# Check for outdated gems
bundle outdated

# Example of a gem with a known vulnerability (hypothetical)
# gem 'nokogiri', '1.8.0' # Vulnerable to CVE-2020-XXXX

# Mitigation: Update to the latest secure version
# Gemfile
# gem 'nokogiri', '>= 1.10.9' # Or a specific secure version

A systematic update process, including thorough testing, was implemented. For critical applications, pinning to specific secure versions is often preferred over using broad version ranges.

2. Insecure Direct Object References (IDOR)

Several API endpoints allowed access to resources without proper authorization checks. For example, fetching user data using a user ID directly from the URL parameter without verifying if the current logged-in user had permission to view that specific user’s data.

# app/controllers/api/v1/users_controller.rb (Vulnerable)
class Api::V1::UsersController < ApplicationController
  def show
    user = User.find(params[:id]) # No authorization check
    render json: user
  end
end

# app/controllers/api/v1/users_controller.rb (Mitigated)
class Api::V1::UsersController < ApplicationController
  before_action :authenticate_user!
  before_action :set_user, only: [:show, :edit, :update]
  after_action :verify_authorized, only: [:show, :edit, :update]

  def show
    render json: @user
  end

  private

  def set_user
    @user = User.find(params[:id])
    authorize @user # Using Pundit or similar gem for authorization
  end
end

Implementing a robust authorization framework (like Pundit or CanCanCan) and ensuring all sensitive actions are protected by `before_action` filters is crucial.

3. Insufficient Input Validation

Beyond YAML, other forms of input were not sufficiently validated, leading to potential injection attacks (SQL, command). We enforced strong validation using Rails’ built-in Active Record validations and custom validators.

# app/models/post.rb
class Post < ApplicationRecord
  validates :title, presence: true, length: { maximum: 255 }
  validates :body, presence: true
  validates :author_email, format: { with: URI::MailTo::EMAIL_REGEXP } # Basic email format validation
end

# app/controllers/posts_controller.rb
class PostsController < ApplicationController
  def create
    @post = Post.new(post_params)
    if @post.save
      redirect_to @post, notice: 'Post was successfully created.'
    else
      render :new
    end
  end

  private

  def post_params
    params.require(:post).permit(:title, :body, :author_email) # Strong parameters
  end
end

Strong parameters are essential to prevent mass assignment vulnerabilities and ensure only permitted attributes are processed.

OVH Specific Considerations

The OVH environment presented specific challenges and opportunities:

1. Network Security Groups and Firewalls

Ensuring that OVH’s network security groups and firewall rules were correctly configured to restrict access to only necessary ports and IP ranges was a priority. For instance, PostgreSQL should only be accessible from application servers, not the public internet.

# Example: OVH Public Cloud Security Group Rule (Conceptual)
# Allow traffic on port 5432 (PostgreSQL) only from specific internal IP ranges
ACTION: ALLOW
PROTOCOL: TCP
DIRECTION: INBOUND
PORT: 5432
SOURCE_IP: 10.0.0.0/16 # Internal application subnet

2. Server Hardening

Dedicated servers require manual hardening. This included disabling unnecessary services, configuring `sshd` securely, and ensuring regular security patching.

# Example: Secure SSH configuration
# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers your_user admin_user
MaxAuthTries 3
LoginGraceTime 30s
ClientAliveInterval 300
ClientAliveCountMax 2
UsePAM yes

3. Logging and Monitoring

Centralized logging (e.g., using ELK stack or OVH’s logging services) and robust monitoring were implemented. This is crucial for detecting suspicious activity, especially after mitigating RCE vulnerabilities.

# Example: Nginx access log format to capture relevant details
# /etc/nginx/nginx.conf
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                '$status $body_bytes_sent "$http_referer" '
                '"$http_user_agent" "$http_x_forwarded_for" '
                '"$request_body"'; # Including request body for deeper analysis (use with caution for PII)

access_log /var/log/nginx/access.log main;

Care must be taken when logging request bodies due to potential PII and performance implications. Often, logging only specific headers or a sanitized version of the body is more practical.

Conclusion and Ongoing Security Posture

The audit successfully identified and mitigated a critical RCE vulnerability stemming from unsafe YAML loading, along with other significant security weaknesses. The mitigation involved replacing `YAML.load` with `YAML.safe_load`, updating dependencies, enforcing authorization, and strengthening input validation.

Maintaining a strong security posture requires continuous effort. This includes regular audits, automated vulnerability scanning, dependency management, and ongoing security training for development teams. The OVH infrastructure, while powerful, demands diligent configuration and maintenance to ensure security at all layers.