How We Audited a High-Traffic Ruby Enterprise Stack on Google Cloud and Mitigated Server-Side Request Forgery (SSRF) in webhook parsers

Initial Audit Scope and Methodology

Our engagement focused on a high-traffic Ruby on Rails enterprise application hosted on Google Cloud Platform (GCP). The primary objective was to identify and mitigate security vulnerabilities, with a specific emphasis on Server-Side Request Forgery (SSRF) within webhook processing logic. Our methodology involved a multi-pronged approach: static code analysis, dynamic application security testing (DAST), infrastructure configuration review, and targeted penetration testing.

The application architecture comprised several key components: a fleet of Ruby on Rails web servers managed by Kubernetes (GKE), a PostgreSQL database (Cloud SQL), Redis for caching and job queues (Memorystore), and various GCP services for logging, monitoring, and storage (Cloud Logging, Cloud Monitoring, Cloud Storage). The webhook ingestion pipeline was particularly complex, involving external service integrations, asynchronous processing via Sidekiq, and data persistence.

Static Code Analysis: Uncovering SSRF Vectors in Webhook Parsers

We began with a deep dive into the codebase, specifically targeting modules responsible for receiving and processing incoming webhooks. Our primary concern was how the application handled URLs provided within webhook payloads. A common SSRF vulnerability arises when an application fetches a resource from a user-controlled URL without proper validation, allowing an attacker to force the server to make requests to internal network resources or arbitrary external hosts.

A critical area of investigation was the use of HTTP client libraries. We looked for patterns where user-supplied input was directly incorporated into request URLs. For instance, a naive implementation might look like this:

# app/controllers/webhooks_controller.rb
def process_payload
  payload = JSON.parse(request.body.read)
  external_resource_url = payload['data']['resource_url']

  # Vulnerable: Directly using user-supplied URL
  response = RestClient.get(external_resource_url)
  # ... process response ...
end

This code is highly susceptible to SSRF. An attacker could provide a URL like http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE_NAME to exfiltrate cloud instance metadata, or http://localhost:8080/admin to probe internal services.

We employed static analysis tools like Brakeman and RuboCop with custom security rules to automate this search. Beyond direct URL construction, we also looked for insecure deserialization of data that might contain URLs, or improper handling of redirects.

Dynamic Analysis and Infrastructure Configuration Review

Complementing static analysis, we performed dynamic testing. This involved sending crafted webhook payloads to a staging environment that mirrored production. We used tools like Burp Suite and custom scripts to:

Inject internal IP addresses (e.g., 10.0.0.1, 192.168.1.1) and metadata service endpoints (169.254.169.254) in URL fields.
Test for DNS rebinding vulnerabilities.
Analyze HTTP headers for potential leakage of internal information.
Observe network traffic originating from the application servers.

Crucially, we reviewed the GCP network configuration. The principle of least privilege dictates that application instances should not have broad network access. We examined:

VPC Firewall Rules: Were there overly permissive ingress or egress rules? Specifically, we checked for egress rules that allowed outbound connections to RFC 1918 private IP address ranges or the metadata service IP.
Network Policies (Kubernetes): Within GKE, Kubernetes Network Policies can restrict pod-to-pod communication. We verified that these policies were in place and correctly configured to limit the blast radius of a compromised pod.
Service Accounts and IAM Roles: The service account attached to the GKE nodes and pods was reviewed. Did it have excessive permissions, particularly those that could be leveraged if metadata was exfiltrated (e.g., broad access to Cloud Storage, Compute Engine, or other sensitive services)?

A common oversight is allowing egress to 0.0.0.0/0 without granular control. For webhook processing, egress should ideally be restricted to only the known, necessary external endpoints.

Mitigation Strategy: Input Validation and Network Segmentation

Based on our findings, we implemented a layered mitigation strategy. The most effective approach to prevent SSRF is robust input validation at the application layer, combined with strict network controls.

Application-Level Validation

The primary fix involved validating all URLs provided in webhook payloads before making any external requests. This validation should:

Allowlist Domains: The most secure approach is to maintain an explicit allowlist of domains the application is permitted to connect to. Any URL not matching this list should be rejected.
Disallow Private IPs: Explicitly reject URLs that resolve to private IP address ranges (RFC 1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) and the GCP metadata service IP (169.254.169.254).
Validate Scheme: Ensure only expected schemes (e.g., http, https) are allowed.
Handle Redirects Carefully: If redirects are necessary, implement logic to prevent following redirects to disallowed destinations.

Here’s an example of a more secure Ruby implementation using a gem like addressable for URL parsing and custom logic for validation:

require 'addressable/uri'
require 'rest-client'

# Define allowed domains and disallowed IP ranges
ALLOWED_DOMAINS = %w(api.example.com cdn.example.org)
DISALLOWED_IPS = [
  '10.0.0.0/8', '172.16.0.0/12', '192.168.0.0/16', # Private IPs
  '169.254.169.254' # GCP Metadata Service
].map { |cidr| IPAddr.new(cidr) }

def safe_get_external_resource(url_string)
  uri = Addressable::URI.parse(url_string)

  # 1. Validate Scheme
  unless %w(http https).include?(uri.scheme)
    raise ArgumentError, "Invalid URL scheme: #{uri.scheme}"
  end

  # 2. Validate Hostname/IP
  host = uri.host
  if host.nil?
    raise ArgumentError, "URL has no host"
  end

  # Resolve hostname to IP and check against disallowed ranges
  begin
    ip_addr = IPAddr.new(host) # Check if host is already an IP
    if DISALLOWED_IPS.any? { |disallowed_range| disallowed_range.include?(ip_addr) }
      raise ArgumentError, "URL resolves to a disallowed IP address: #{host}"
    end
  rescue IPAddr::InvalidAddressError
    # Host is not an IP, resolve it. In a real app, use a DNS resolver
    # and ensure it's not pointing to a private IP. For simplicity here,
    # we'll rely on the domain allowlist.
    unless ALLOWED_DOMAINS.include?(host)
      raise ArgumentError, "Disallowed domain: #{host}"
    end
  end

  # 3. Handle redirects (simplified - a real implementation would be more robust)
  # RestClient by default follows redirects up to a certain limit.
  # You might want to disable redirects or add custom logic here.

  # If all checks pass, make the request
  RestClient.get(url_string) do |response, request, result|
    # Further checks on response headers or status codes can be added here
    response
  end
rescue Addressable::URI::InvalidURIError => e
  raise ArgumentError, "Invalid URI: #{e.message}"
rescue RestClient::Exception => e
  # Handle network errors, timeouts, etc.
  raise "Failed to fetch resource: #{e.message}"
end

# Example usage within a controller action
def process_payload
  payload = JSON.parse(request.body.read)
  external_resource_url = payload.dig('data', 'resource_url')

  if external_resource_url.present?
    response = safe_get_external_resource(external_resource_url)
    # ... process response ...
  else
    render json: { error: "resource_url not provided" }, status: :bad_request
  end
rescue ArgumentError => e
  render json: { error: "Invalid webhook payload: #{e.message}" }, status: :bad_request
rescue => e # Catch other potential errors
  render json: { error: "An internal error occurred" }, status: :internal_server_error
end

Network-Level Controls (GCP/GKE)

Application-level validation is the first line of defense, but network controls provide a crucial safety net. We implemented the following GCP and GKE configurations:

Egress Firewall Rules: In GCP, we configured VPC firewall rules to deny all egress traffic by default and then explicitly allow egress only to specific, necessary IP addresses or CIDR blocks for external services. Crucially, we ensured no egress was permitted to RFC 1918 ranges or the metadata service IP.

Kubernetes Network Policies: Within GKE, we deployed Network Policies to restrict egress from the webhook processing pods. This policy ensured that these pods could only initiate connections to specific external endpoints and not to other pods within the cluster or internal GCP services.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: webhook-egress-policy
  namespace: production # Or your application's namespace
spec:
  podSelector:
    matchLabels:
      app: your-rails-app # Label identifying your webhook processing pods
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 203.0.113.0/24 # Example: Allow egress to a specific external API CIDR
      ports:
        - protocol: TCP
          port: 443
    - to:
        - ipBlock:
            cidr: 198.51.100.0/24 # Example: Allow egress to another required service
      ports:
        - protocol: TCP
          port: 80
    # Explicitly deny access to private IPs and metadata service
    # This is often implicitly handled by default deny, but explicit is better.
    # Note: GCP's default egress is usually to 0.0.0.0/0. NetworkPolicy
    # needs to override this.
    # If your cluster has a default deny egress policy, you only need the 'to' rules above.
    # If not, you might need a rule to deny specific ranges if they aren't covered by the default.
    # For simplicity, we assume a default-deny egress or that the above rules are exhaustive.

By combining strict application-level validation with granular network controls, we significantly reduced the attack surface for SSRF vulnerabilities.

Ongoing Monitoring and Incident Response

Security is not a one-time fix. We established ongoing monitoring to detect and alert on suspicious outbound network activity. This included:

VPC Flow Logs: Enabled VPC Flow Logs in GCP to capture network traffic metadata. We configured Cloud Logging to ingest these logs and set up alerts for any egress traffic to unexpected destinations or internal IP ranges.
Application Logs: Enhanced application logging to record any instances where the SSRF validation logic rejected a request, along with the offending URL. This helps in identifying potential probing attempts.
Security Scanning: Integrated automated security scanning tools into the CI/CD pipeline to catch regressions and new vulnerabilities early.

A well-defined incident response plan for security events, including SSRF, ensures that the team can react swiftly and effectively should a new threat emerge.

How We Audited a High-Traffic Ruby Enterprise Stack on Google Cloud and Mitigated Server-Side Request Forgery (SSRF) in webhook parsers

Initial Audit Scope and Methodology

Static Code Analysis: Uncovering SSRF Vectors in Webhook Parsers

Dynamic Analysis and Infrastructure Configuration Review

Mitigation Strategy: Input Validation and Network Segmentation

Application-Level Validation

Network-Level Controls (GCP/GKE)

Ongoing Monitoring and Incident Response

Recent Posts

Top Categories

Our Products

Our Services