How We Audited a High-Traffic Ruby Enterprise Stack on AWS and Mitigated unsafe YAML loading allowing remote code execution

Initial Triage: Identifying the Attack Vector

Our engagement began with a critical alert from a client’s security monitoring system, flagging unusual outbound network traffic originating from several Ruby on Rails application servers hosted on AWS. The traffic patterns were indicative of data exfiltration and potential command-and-control (C2) communication. The stack in question handled a high volume of user-generated content, making it a prime target for injection-based attacks. The initial hypothesis pointed towards a deserialization vulnerability, a common pitfall in many web frameworks.

The first step was to gain access to the production environment’s logs and running processes. We leveraged AWS CloudWatch Logs for application and system logs, and AWS Systems Manager (SSM) Run Command for immediate introspection of the live instances. The goal was to pinpoint the exact process and code path responsible for the suspicious activity.

Deep Dive into Application Logs and Codebase

We started by examining the Rails application logs, specifically looking for any unusual requests or errors that coincided with the flagged outbound traffic. The key was to correlate timestamps. We filtered logs for keywords like “YAML,” “load,” “parse,” and any unexpected exceptions.

The following CloudWatch Logs Insights query was instrumental in this phase:

fields @timestamp, @message
| filter @message like /YAML|load|parse/
| sort @timestamp desc
| limit 100

This query helped us narrow down potential areas. We observed a pattern of requests that seemed to be submitting malformed or unusually structured data, often associated with file uploads or API endpoints that processed complex data structures. The logs revealed that a specific controller action was frequently invoked with these suspicious payloads.

Next, we performed a static code analysis of the relevant controller and model files. The primary concern was the use of `YAML.load` or `YAML.safe_load` without proper sanitization or type checking on the input. In Ruby, `YAML.load` is notoriously unsafe as it can deserialize arbitrary Ruby objects, including those that can execute arbitrary code upon instantiation or method calls.

We identified the following vulnerable code snippet within a legacy service object:

# app/services/legacy_data_importer.rb

require 'yaml'

class LegacyDataImporter
  def self.import(yaml_string)
    data = YAML.load(yaml_string) # <<< VULNERABLE LINE
    # ... process data ...
  end
end

The `YAML.load` call directly deserialized user-provided input without any form of validation. This is a classic Remote Code Execution (RCE) vulnerability. An attacker could craft a YAML payload that, when loaded, would instantiate a Ruby object capable of executing shell commands.

Crafting and Testing the Exploit

To confirm the vulnerability, we needed to craft a proof-of-concept (PoC) exploit. The standard Ruby YAML RCE payload involves leveraging the `Psych` library (which `YAML.load` uses internally) to instantiate objects that execute system commands. A common technique is to use `yaml_nodes` to construct a malicious object graph.

Here’s a simplified example of a malicious YAML payload designed to execute `id` on the server:

!!python/object/apply:Kernel#exec
- id

This payload, when passed to `YAML.load`, would attempt to execute the `exec` method from the `Kernel` module with the argument `id`. In a Rails application, this would execute the `id` command on the server’s operating system.

We simulated sending this payload via an HTTP POST request to the vulnerable endpoint. The request would look something like this:

curl -X POST \
  http://your-app-domain.com/legacy_data/import \
  -H 'Content-Type: application/x-yaml' \
  -d '!!python/object/apply:Kernel#exec
- id'

If successful, the application would respond with the output of the `id` command (e.g., `uid=1000(deploy) gid=1000(deploy) groups=1000(deploy)`), confirming the RCE. The observed outbound traffic in the production environment was consistent with attackers using similar payloads to establish reverse shells or exfiltrate sensitive data.

Mitigation Strategy: Secure Deserialization and Input Validation

The immediate mitigation was to replace all instances of `YAML.load` with `YAML.safe_load`. However, `YAML.safe_load` is not a silver bullet; it only prevents the deserialization of arbitrary Ruby objects. It still allows for the deserialization of basic types like Hashes, Arrays, Strings, Numbers, etc. Therefore, it’s crucial to combine `YAML.safe_load` with strict validation of the deserialized data structure.

The corrected code snippet using `YAML.safe_load` and basic type checking:

# app/services/legacy_data_importer.rb

require 'yaml'

class LegacyDataImporter
  def self.import(yaml_string)
    # Use safe_load to prevent arbitrary object instantiation
    data = YAML.safe_load(yaml_string, permitted_classes: [Symbol], aliases: true) # aliases: true is often needed for complex YAML, but be cautious

    # Further validation: Ensure data is a Hash and contains expected keys
    unless data.is_a?(Hash) && data.key?('user_id') && data.key?('data_payload')
      raise ArgumentError, "Invalid data structure received"
    end

    # Validate specific fields within the data
    user_id = data['user_id']
    data_payload = data['data_payload']

    unless user_id.is_a?(Integer) && data_payload.is_a?(String)
      raise ArgumentError, "Invalid data types for user_id or data_payload"
    end

    # ... process validated data ...
    Rails.logger.info "Successfully imported data for user_id: #{user_id}"
  end
end

Key improvements:

Replaced `YAML.load` with `YAML.safe_load`.
Specified `permitted_classes` to restrict deserialization to only allowed types (e.g., `Symbol`).
Enabled `aliases: true` if necessary for legitimate complex YAML structures, but with awareness of potential risks.
Added explicit type checking (`is_a?`) for the top-level structure (expecting a Hash).
Validated the presence and types of expected keys within the Hash.

For more complex data structures, consider using a dedicated schema validation library like `Dry-Schema` or `JSON::Schema` (if converting YAML to JSON first) to enforce a strict contract on the deserialized data.

AWS-Specific Hardening and Monitoring

Beyond code-level fixes, we reviewed the AWS infrastructure for further hardening and improved monitoring capabilities.

Security Groups and Network ACLs

We audited the Security Group rules associated with the EC2 instances running the Rails application. The principle of least privilege was applied: only necessary inbound ports (e.g., 80, 443) were open, and outbound traffic was restricted to essential services (e.g., database, external APIs, AWS services). Any unexpected outbound connections were flagged.

We also reviewed Network ACLs at the subnet level for an additional layer of defense, ensuring no overly permissive rules were in place.

AWS WAF Integration

To proactively block malicious requests, we configured AWS Web Application Firewall (WAF) rules. Specifically, we created custom rules to:

Detect and block requests containing known YAML RCE payloads (e.g., `!!python/object/apply`, `!!yaml/object`).
Rate-limit suspicious endpoints that might be targeted for brute-force or fuzzing attacks.
Block requests with malformed Content-Type headers or unexpected data formats.

An example of a WAF rule to block common YAML RCE indicators:

{
    "Name": "BlockYAML_RCE_Payloads",
    "Priority": 1,
    "Action": {
        "None": {}
    },
    "Statement": {
        "Or": {
            "Statements": [
                {
                    "ByteMatchStatement": {
                        "SearchString": "!!python/object/apply",
                        "FieldToMatch": {
                            "Body": {
                                "OversizeHandling": "CONTINUE"
                            }
                        },
                        "TextTransformation": {
                            "Priority": 0,
                            "Type": "NONE"
                        },
                        "PositionalConstraint": "CONTAINS"
                    }
                },
                {
                    "ByteMatchStatement": {
                        "SearchString": "!!yaml/object",
                        "FieldToMatch": {
                            "Body": {
                                "OversizeHandling": "CONTINUE"
                            }
                        },
                        "TextTransformation": {
                            "Priority": 0,
                            "Type": "NONE"
                        },
                        "PositionalConstraint": "CONTAINS"
                    }
                }
                // Add more patterns as needed
            ]
        }
    },
    "VisibilityConfig": {
        "SampledRequestsEnabled": true,
        "CloudWatchMetricsEnabled": true,
        "MetricName": "BlockYAML_RCE_Payloads"
    }
}

Enhanced Logging and Alerting

We implemented more granular logging for all deserialization attempts, including the source IP, user agent, and a sanitized version of the input payload (if feasible and safe). This data was fed into CloudWatch Logs, where we set up CloudWatch Alarms to trigger notifications (via SNS) for:

Any successful `YAML.load` calls (indicating a potential bypass or missed vulnerability).
Multiple failed deserialization attempts from a single IP address.
Specific error patterns related to data validation failures.
Any outbound traffic from application servers to unexpected destinations.

This proactive alerting mechanism allows for rapid detection of ongoing or new attack attempts.

Post-Mitigation Verification and Ongoing Security Posture

After deploying the code changes and AWS configurations, we performed a series of verification steps. This included re-testing the exploit payloads against the hardened application and monitoring logs for any anomalies. We also conducted a broader security audit of other parts of the application that might handle serialized data, ensuring consistency in the application of secure practices.

The incident highlighted the critical importance of understanding the security implications of deserialization functions in any programming language. For Ruby, `YAML.load` is a known danger, and its use should be strictly avoided in favor of `YAML.safe_load` coupled with robust data validation. On AWS, leveraging services like WAF, Security Groups, and enhanced CloudWatch monitoring provides essential layers of defense against such threats.