Code Auditing Guidelines: Detecting and Fixing unsafe YAML loading allowing remote code execution in Your Ruby Monolith
Understanding the YAML Deserialization Vulnerability
Many Ruby applications, especially older monoliths, rely on YAML for configuration, data storage, and inter-process communication. The `YAML.load` method in Ruby, when used with untrusted input, presents a significant security risk. This is because YAML is a superset of JSON and can represent arbitrary Ruby objects. When `YAML.load` encounters a specially crafted YAML string, it can deserialize it into a Ruby object that executes arbitrary code during the deserialization process. This is a classic deserialization vulnerability, often leading to Remote Code Execution (RCE).
The core of the problem lies in YAML’s ability to represent complex data structures, including custom classes and their states. A malicious actor can craft a YAML payload that, when loaded, invokes methods like `initialize`, `instance_eval`, or other Ruby constructs that allow for code execution. This is particularly dangerous in web applications where user-supplied data might be directly passed to `YAML.load` without proper sanitization or validation.
Identifying Risky `YAML.load` Usage
The first step in auditing is to locate all instances of `YAML.load` within your codebase. A simple grep command can help, but it’s crucial to understand the context of each usage. Look for `YAML.load` calls that process data originating from external sources, such as:
- HTTP request parameters (POST bodies, query strings)
- File uploads
- Database entries populated by external systems
- Environment variables
- External API responses
Consider this example of a vulnerable pattern:
require 'yaml' # Assume 'user_input' comes from an HTTP POST request body user_input = params[:data] # In a Rails controller, for example # VULNERABLE: Directly loading untrusted YAML data = YAML.load(user_input) # Further processing of 'data' which might be compromised process_data(data)
In this snippet, if `user_input` contains a malicious YAML payload, `YAML.load` will execute arbitrary code before `process_data` is even called.
Exploitation Techniques
Attackers can leverage YAML’s object serialization capabilities to achieve RCE. A common technique involves using the `!!ruby/object:Class` tag to instantiate a Ruby class and then execute code within its context. For instance, a payload could be crafted to instantiate a `Process` object and run a shell command.
Here’s a simplified example of a malicious YAML payload:
!!ruby/object:Process
args:
- /bin/bash
- -c
- "echo 'PWNED' >> /tmp/hacked.txt"
When `YAML.load` processes this, it attempts to create a `Process` object. Depending on the Ruby version and available libraries, this can trigger the execution of the command specified. More sophisticated payloads might use `instance_eval` or other Ruby metaprogramming features to achieve more complex RCE scenarios.
Mitigation Strategies: `YAML.safe_load`
The most direct and recommended mitigation is to replace all instances of `YAML.load` with `YAML.safe_load`. The `safe_load` method, introduced in Ruby 2.5 and backported to earlier versions via the `yaml` gem, restricts the types of objects that can be deserialized. It only allows basic Ruby types (like strings, numbers, arrays, hashes) and explicitly whitelisted classes. This prevents the deserialization of arbitrary Ruby objects and thus mitigates RCE risks.
To use `YAML.safe_load`, you typically need to ensure the `yaml` gem is updated or that your Ruby version is recent enough. If you’re on an older Ruby version, you might need to add `gem ‘yaml’` to your `Gemfile`.
require 'yaml'
# Assume 'user_input' comes from an HTTP POST request body
user_input = params[:data]
# SECURE: Using safe_load to prevent arbitrary object deserialization
begin
data = YAML.safe_load(user_input)
process_data(data)
rescue Psych::SyntaxError => e
# Handle malformed YAML gracefully
Rails.logger.error "YAML parsing error: #{e.message}"
render json: { error: "Invalid YAML format" }, status: :bad_request
rescue ArgumentError => e
# Handle cases where safe_load might still reject certain types
Rails.logger.error "YAML safe_load error: #{e.message}"
render json: { error: "Unsafe YAML content detected" }, status: :bad_request
end
It’s also good practice to wrap `YAML.safe_load` in a `begin…rescue` block to catch potential `Psych::SyntaxError` (for malformed YAML) or `ArgumentError` (if `safe_load` encounters something it deems unsafe, even with its defaults).
Advanced Mitigation: Custom Whitelisting with `safe_load`
While `YAML.safe_load` provides a strong default security posture, there might be scenarios where you need to deserialize specific custom classes from YAML. In such cases, `YAML.safe_load` accepts an optional `permitted_classes` argument (and `aliases` for YAML aliases). This allows you to explicitly whitelist the classes that are allowed.
For example, if you have a `Configuration` class that you expect to load:
require 'yaml'
class Configuration
attr_accessor :setting1, :setting2
def initialize(setting1: nil, setting2: nil)
@setting1 = setting1
@setting2 = setting2
end
end
user_input = "--- !ruby/object:Configuration\n setting1: value1\n setting2: value2"
# SECURE with explicit whitelisting
begin
data = YAML.safe_load(user_input, permitted_classes: [Configuration], aliases: true)
# 'data' will be an instance of Configuration
puts "Setting 1: #{data.setting1}"
rescue Psych::SyntaxError => e
Rails.logger.error "YAML parsing error: #{e.message}"
rescue ArgumentError => e
Rails.logger.error "YAML safe_load error: #{e.message}"
end
The `aliases: true` option is important if your YAML might use YAML aliases (e.g., `&anchor` and `*alias`). Be cautious when whitelisting classes; only permit those that are absolutely necessary and thoroughly understood. Avoid whitelisting core Ruby classes like `Object`, `BasicObject`, or `Kernel` as they can still be used for malicious purposes.
Auditing Workflow and Tooling
A systematic approach to auditing is essential for large codebases:
- Static Analysis: Use `grep` or more sophisticated static analysis tools (like Brakeman for Rails, or custom linters) to identify all `YAML.load` calls.
- Contextual Review: For each identified `YAML.load` call, determine the source of the input. If the input is untrusted, flag it for immediate remediation.
- Dependency Check: Ensure your Ruby version is up-to-date or that the `yaml` gem is installed and recent enough to provide `YAML.safe_load`.
- Automated Remediation (with caution): For simple cases, you might consider automated refactoring to replace `YAML.load` with `YAML.safe_load`. However, always review automated changes thoroughly, especially if custom whitelisting was involved.
- Testing: Write unit and integration tests that specifically target YAML loading. Include tests with known malicious payloads to ensure your `safe_load` implementations (and any custom whitelisting) correctly reject them.
Consider integrating security linters into your CI/CD pipeline to catch regressions. Tools like:
- Brakeman: A static analysis tool for Ruby on Rails applications that can detect various security vulnerabilities, including insecure YAML deserialization.
- RuboCop: While primarily a style guide enforcer, custom RuboCop rules can be written to flag `YAML.load` usage.
Example of a custom RuboCop rule to detect `YAML.load`:
# .rubocop/unsafe_yaml_load.rb
require 'rubocop'
module RuboCop
module Cop
module MyCustomCop
class UnsafeYamlLoad < Cop::Base
MSG = "Use `YAML.safe_load` instead of `YAML.load` for untrusted input."
def_node_matcher :yaml_load_call, "(send (const nil :YAML) :load $_)"
def on_send(node)
yaml_load_call(node) do |args|
add_offense(node, message: MSG)
end
end
end
end
end
end
You would then configure RuboCop to load this custom cop in your `.rubocop.yml`.
Conclusion
The `YAML.load` vulnerability is a critical security flaw that can lead to RCE in Ruby applications. By systematically auditing your codebase for `YAML.load` usage, understanding the context of data sources, and migrating to `YAML.safe_load` (with appropriate whitelisting if necessary), you can significantly harden your application against this common attack vector. Continuous vigilance through automated tooling and testing is key to maintaining a secure posture.