How We Audited a High-Traffic Ruby Enterprise Stack on AWS and Mitigated Server-Side Request Forgery (SSRF) in webhook parsers
Initial Stack Assessment and Vulnerability Discovery
Our engagement began with a deep dive into a high-traffic Ruby on Rails enterprise application hosted on AWS. The primary objective was to identify and remediate security vulnerabilities, with a specific focus on Server-Side Request Forgery (SSRF) within webhook processing logic. The stack comprised several key components: a fleet of EC2 instances running Ruby 3.x, PostgreSQL managed by RDS, Redis for caching and job queues, and an API Gateway for external integrations. Load balancing was handled by AWS ELB, and deployment was orchestrated via a CI/CD pipeline leveraging Jenkins and Docker.
The initial assessment involved static code analysis using tools like Brakeman and RuboCop, complemented by dynamic analysis through targeted penetration testing. We focused on areas where external input was processed and potentially used to construct network requests. Webhook parsers, by their nature, are prime candidates for SSRF due to their reliance on external, often untrusted, data sources.
Deep Dive into Webhook Parsers and SSRF Vectors
The application utilized a common pattern for handling webhooks: incoming POST requests with JSON payloads were deserialized, and specific fields within these payloads were used to trigger subsequent actions. A critical vulnerability was identified in a controller action responsible for processing incoming payment gateway notifications. The payload contained a URL field, which was intended to be a callback URL for a third-party service. However, the application logic directly used this URL to make an outbound HTTP request to confirm the callback, without proper validation.
Consider the following simplified (and vulnerable) controller snippet:
# app/controllers/webhooks_controller.rb
class WebhooksController < ApplicationController
def process_payment_notification
payload = JSON.parse(request.body.read)
callback_url = payload['callback_url']
payment_status = payload['status']
# Vulnerable: Directly uses untrusted callback_url for an outbound request
if payment_status == 'completed'
begin
# This is the SSRF vector. An attacker could provide a malicious callback_url
# pointing to internal AWS metadata endpoints or other sensitive internal services.
response = Net::HTTP.get_response(URI.parse(callback_url))
Rails.logger.info "Callback confirmation for #{callback_url}: #{response.code}"
rescue StandardError => e
Rails.logger.error "Error confirming callback for #{callback_url}: #{e.message}"
end
end
render json: { message: 'Notification processed' }, status: :ok
end
end
The immediate risk was that an attacker could craft a webhook payload with a `callback_url` pointing to internal AWS endpoints, such as the EC2 instance metadata service (IMDS) at `http://169.254.169.254/latest/meta-data/`. This would allow an attacker to exfiltrate sensitive information like IAM role credentials, instance IDs, and other metadata, potentially leading to further compromise of the AWS environment.
Mitigation Strategy: Input Validation and Network Egress Control
Our mitigation strategy involved a multi-layered approach, focusing on robust input validation and enforcing network egress policies. The primary fix involved validating the `callback_url` before initiating any outbound requests.
1. Strict URL Validation and Whitelisting
The most effective defense against SSRF is to strictly validate any user-supplied URL. This typically involves:
- Schema Validation: Ensure the URL uses an allowed scheme (e.g., `https`).
- Domain Whitelisting: Only allow requests to a predefined list of trusted domains.
- IP Address Restrictions: Prevent requests to private IP ranges (RFC 1918) and loopback addresses.
- Port Restrictions: Limit outbound requests to specific, necessary ports.
We refactored the vulnerable code to incorporate these checks. Instead of directly using `Net::HTTP`, we introduced a helper method that performs these validations. For domain whitelisting, we leveraged a configuration setting.
# app/services/url_validator.rb
require 'uri'
class UrlValidator
ALLOWED_SCHEMES = %w(https)
# In a real scenario, this would be loaded from configuration (e.g., ENV vars, YAML)
TRUSTED_DOMAINS = %w(api.example.com webhook.thirdparty.com)
def self.valid?(url_string)
return false unless url_string.present?
begin
uri = URI.parse(url_string)
# 1. Scheme validation
return false unless ALLOWED_SCHEMES.include?(uri.scheme)
# 2. Domain whitelisting
return false unless TRUSTED_DOMAINS.any? { |domain| uri.host.ends_with?(domain) }
# 3. IP Address and Loopback restrictions
# This is a simplified check; a more robust solution might use a gem or more comprehensive regex.
# We explicitly disallow common private IP ranges and loopback.
return false if uri.host.match?(/^127\./) || uri.host.match?(/^10\./) || uri.host.match?(/^192\.168\./) || uri.host.match?(/^172\.(1[6-9]|2\d|3[01])\./)
# 4. Port restrictions (optional, but good practice)
# Default ports for https are 443. If other ports are needed, they should be explicitly allowed.
allowed_ports = [443]
return false if uri.port.present? && !allowed_ports.include?(uri.port)
true
rescue URI::InvalidURIError
false
rescue StandardError => e
Rails.logger.error "Unexpected error during URL validation: #{e.message}"
false
end
end
end
# app/controllers/webhooks_controller.rb (modified)
class WebhooksController < ApplicationController
def process_payment_notification
payload = JSON.parse(request.body.read)
callback_url = payload['callback_url']
payment_status = payload['status']
if payment_status == 'completed'
if UrlValidator.valid?(callback_url)
begin
# Now using the validated URL
response = Net::HTTP.get_response(URI.parse(callback_url))
Rails.logger.info "Callback confirmation for #{callback_url}: #{response.code}"
rescue StandardError => e
Rails.logger.error "Error confirming callback for #{callback_url}: #{e.message}"
end
else
Rails.logger.warn "Invalid callback URL provided: #{callback_url}. Skipping confirmation."
# Optionally, return an error to the sender or log more details.
end
end
render json: { message: 'Notification processed' }, status: :ok
end
end
2. Network Egress Filtering with Security Groups and Network ACLs
While code-level validation is crucial, it’s not foolproof. An additional layer of defense is network-level egress filtering. We reviewed and tightened the AWS Security Groups associated with the EC2 instances and the Network ACLs (NACLs) for the subnets.
The goal was to restrict outbound traffic from the application servers to only the necessary destinations and ports. For the webhook processing servers, this meant allowing outbound connections only to the specific domains and ports defined in our `TRUSTED_DOMAINS` list (e.g., `api.example.com` on port 443). All other outbound traffic was denied by default.
Example Security Group Rule (AWS Console/CLI):
# Outbound Rule for Webhook Processing Instances Type: Outbound Protocol: TCP Port Range: 443 Destination: CIDR block for trusted webhook endpoints (e.g., specific IPs or a security group of the third-party service if known) Description: Allow outbound HTTPS to trusted webhook providers
Example Network ACL Rule (AWS Console/CLI):
# Outbound NACL Rule for Subnet hosting Webhook Processors Rule Number: 100 Type: Outbound Protocol: TCP Port Range: 443 Destination: CIDR block for trusted webhook endpoints Allow/Deny: ALLOW Description: Allow outbound HTTPS to trusted webhook providers # Default Deny Rule (implicit or explicit) Rule Number: * (or a high number like 32700) Type: Outbound Protocol: ALL Port Range: ALL Destination: 0.0.0.0/0 Allow/Deny: DENY Description: Deny all other outbound traffic
This network-level control acts as a safety net. Even if a code-level bypass were discovered, the network configuration would prevent the server from initiating connections to unauthorized internal or external IP addresses.
3. IAM Role Least Privilege
A fundamental security principle is least privilege. We reviewed the IAM roles attached to the EC2 instances. The roles were configured to have broad permissions, including the ability to interact with various AWS services. For the instances handling webhook processing, we created a dedicated IAM role with minimal necessary permissions. This role did not have permissions to access sensitive services like S3 buckets containing PII, or the ability to launch new EC2 instances. This limits the blast radius if an attacker were to compromise the instance and leverage its IAM credentials.
Testing and Verification
After implementing the code changes and network configurations, a rigorous testing phase was conducted. This included:
- Positive Testing: Sending valid webhook payloads with legitimate callback URLs to ensure the system functions as expected.
- Negative Testing (SSRF Attempts):
- Crafting payloads with callback URLs pointing to `http://169.254.169.254/latest/meta-data/`.
- Using URLs with private IP addresses (e.g., `http://10.0.0.1/`).
- Attempting to use different schemes (e.g., `ftp://`, `file://`).
- Using untrusted external domains.
- Network Egress Verification: Using tools like `tcpdump` or `netcat` on the instances (in a controlled test environment) to confirm that only allowed outbound connections were being established.
The validation logic successfully blocked all SSRF attempts. Network logs and security group flow logs confirmed that no unauthorized outbound connections were permitted.
Ongoing Monitoring and Maintenance
Security is not a one-time fix. To maintain the security posture, we implemented several ongoing measures:
- Automated Security Scans: Ensured Brakeman and other static analysis tools were integrated into the CI/CD pipeline to catch regressions.
- Runtime Monitoring: Configured AWS CloudWatch Alarms and GuardDuty to detect suspicious network activity or API calls originating from the EC2 instances.
- Regular Audits: Scheduled periodic reviews of security group rules, NACLs, and IAM policies.
- Dependency Management: Continuously monitored and updated Ruby gems and system packages to patch known vulnerabilities.
- Threat Intelligence: Stayed informed about emerging SSRF techniques and attack vectors.
By combining robust code-level validation, strict network egress controls, and continuous monitoring, we significantly hardened the application against SSRF attacks and improved the overall security of the AWS environment.