How We Audited a High-Traffic C++ Enterprise Stack on AWS and Mitigated XML External Entity (XXE) injection in old SOAP integrations

Auditing a High-Traffic C++ Enterprise Stack on AWS

Our recent engagement involved a critical, high-traffic enterprise application stack built primarily on C++ services, deployed across a complex AWS infrastructure. The primary objective was a comprehensive security audit, with a specific focus on identifying and mitigating vulnerabilities within legacy SOAP integrations. These integrations, while functional, represented a significant attack surface, particularly concerning XML External Entity (XXE) injection.

The stack comprised several microservices written in C++, communicating via SOAP APIs. These services were hosted on EC2 instances, managed by Auto Scaling Groups, and load-balanced by an Application Load Balancer (ALB). Data persistence was handled by RDS instances (primarily PostgreSQL), and configuration management relied on AWS Systems Manager Parameter Store. The sheer volume of requests and the sensitive nature of the data processed necessitated a rigorous and methodical auditing approach.

Identifying the XXE Vulnerability in SOAP Integrations

The core of the XXE vulnerability lies in how XML parsers handle external entities. When an XML parser is configured to process external entities, an attacker can craft malicious XML input that references external resources. This can lead to:

Information Disclosure: Reading arbitrary files from the server’s filesystem (e.g., /etc/passwd, configuration files).
Server-Side Request Forgery (SSRF): Forcing the server to make requests to internal or external resources on behalf of the attacker.
Denial of Service (DoS): Exploiting entity expansion to consume excessive resources.

In our C++ stack, the SOAP integrations were implemented using a third-party XML parsing library. A common pitfall is the default configuration of these libraries, which often enables external entity resolution. We began by analyzing the C++ code responsible for parsing incoming SOAP requests. The target was to locate the XML parsing functions and inspect their configuration.

Code Analysis: Locating the Vulnerable Parsing Logic

The critical section of code typically looked something like this (simplified for illustration):

Example Vulnerable C++ XML Parsing Snippet

#include <libxml/parser.h>
#include <libxml/tree.h>

// ...

void parseSoapRequest(const std::string& xmlString) {
    xmlDocPtr doc = xmlReadMemory(xmlString.c_str(), xmlString.length(), NULL, NULL, NULL);
    if (doc == NULL) {
        // Handle parsing error
        return;
    }

    // ... process the XML document ...

    xmlFreeDoc(doc);
}

The function xmlReadMemory (from libxml2, a common C XML parsing library) by default can be configured to resolve external entities. Without explicit disabling, it’s vulnerable. We needed to find where and how this parsing was being invoked and if any security configurations were applied.

Crafting Malicious XML Payloads

To confirm the vulnerability, we crafted several test payloads. The first aimed to read a local file:

Payload 1: File Disclosure via XXE

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<request>
  <data>&xxe;</data>
</request>

If the C++ service processed this XML and returned the parsed content (or an error message revealing the content), it would confirm the XXE vulnerability. A second payload focused on SSRF, attempting to probe internal AWS metadata endpoints:

Payload 2: SSRF via XXE to AWS Metadata Endpoint

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE_NAME">
]>
<request>
  <data>&xxe;</data>
</request>

Successful exfiltration of IAM role credentials from this endpoint would be a critical security breach, allowing an attacker to impersonate the EC2 instance’s IAM role.

Mitigation Strategies: Securing XML Parsing in C++

The most effective way to mitigate XXE vulnerabilities is to disable external entity resolution in the XML parser. For libxml2, this involves setting parser options.

Implementing Secure Parsing Options

The fix involves modifying the C++ code to explicitly configure the libxml2 parser to disallow external entity processing. This is achieved by passing specific options to xmlReaderForXml or xmlReadMemory.

Example Secure C++ XML Parsing Snippet

#include <libxml/parser.h>
#include <libxml/tree.h>
#include <libxml/xmlreader.h> // For xmlReaderForXml

// ...

void parseSoapRequestSecurely(const std::string& xmlString) {
    // Set parser options to disable DTDs and external entities
    // LIBXML_PARSE_NOENT: Do not expand entities.
    // LIBXML_PARSE_NOXINCLUDE: Do not process XInclude directives.
    // LIBXML_PARSE_DTDATTR: Load the DTD attribute value. (Often needed for validation, but can be a vector if not careful)
    // LIBXML_PARSE_DTDVALID: Validate the document against the DTD. (Also needs careful consideration)

    // The most robust approach is to disable external entity resolution entirely.
    // For libxml2, this is achieved by setting the 'loadsubset' and 'loadexternal'
    // options to 0 when using xmlReader.
    // Alternatively, using xmlParserCtxtPtr and setting specific options.

    // Using xmlReader for more granular control and security
    xmlReaderSettingsPtr settings = xmlReaderSettingsNew();
    if (settings) {
        settings->loadSubset = 0; // Disable loading of external DTD subsets
        settings->loadExternalGeneralEntities = 0; // Disable external general entities
        settings->loadExternalParameterEntities = 0; // Disable external parameter entities
        settings->replaceEntities = 0; // Do not replace entities (can prevent some DoS)
    } else {
        // Handle error
        return;
    }

    xmlTextReaderPtr reader = xmlReaderForMemory(
        xmlString.c_str(),
        xmlString.length(),
        NULL, // URI
        NULL, // encoding
        0,    // options - we use settings for this
        settings // Use custom settings
    );

    if (reader == NULL) {
        // Handle error
        xmlReaderSettingsFree(settings);
        return;
    }

    int ret;
    while ((ret = xmlTextReaderRead(reader)) == 1) {
        // Process the XML node by node
        // Example: get node name
        const xmlChar* nodeName = xmlTextReaderConstName(reader);
        if (nodeName) {
            // std::cout << "Node: " << nodeName << std::endl;
        }
        // ... further processing ...
    }

    xmlFreeTextReader(reader);
    xmlReaderSettingsFree(settings);

    if (ret < 0) {
        // Handle read error
    }
}

The key changes are the creation of xmlReaderSettingsPtr and explicitly setting loadSubset, loadExternalGeneralEntities, and loadExternalParameterEntities to 0. This ensures that no external DTDs or entities are fetched or processed, effectively neutralizing XXE attacks. The use of xmlTextReader is generally preferred for security-sensitive parsing as it offers finer control over the parsing process.

AWS-Level Mitigations and Monitoring

While code-level fixes are paramount, a defense-in-depth strategy involves AWS-specific configurations and monitoring.

Web Application Firewall (WAF) Rules

AWS WAF can be configured to inspect incoming HTTP requests for patterns indicative of XXE attacks. While not foolproof against all XXE variants (especially those that don’t rely on obvious XML syntax), it can block common attempts. We deployed custom WAF rules to detect suspicious DOCTYPE declarations and entity references within SOAP requests targeting our ALB.

Example WAF Rule Logic (Conceptual)

// Rule: Block requests with suspicious DOCTYPE declarations
If the request body contains:
  - Pattern: "<!DOCTYPE" followed by any characters, then "SYSTEM" or "PUBLIC"
  - Pattern: "<!ENTITY" followed by any characters, then "SYSTEM" or "PUBLIC"

// Rule: Block requests targeting internal metadata endpoints
If the request body contains:
  - Pattern: "http://169.254.169.254/"

These rules were applied to the ALB, providing an initial layer of defense before requests even reached the C++ services.

Logging and Alerting

Comprehensive logging is crucial for detecting and responding to security incidents. We enhanced logging for:

ALB Access Logs: To capture incoming requests and identify suspicious patterns.
Application Logs (C++ services): To log parsing errors, malformed requests, and any detected security anomalies.
AWS CloudTrail: To monitor API calls related to EC2, IAM, and WAF, looking for unauthorized access attempts.

We configured CloudWatch Alarms to trigger notifications (via SNS) for specific events, such as:

High rate of WAF rule matches.
Application errors related to XML parsing.
Unusual API calls to AWS metadata services from EC2 instances.

Deployment and Verification

The code changes were deployed through our standard CI/CD pipeline. After deployment, we re-ran our test payloads against the updated services to verify that the XXE vulnerabilities were successfully mitigated. We also performed regression testing to ensure that legitimate SOAP requests were still processed correctly.

The audit and mitigation process for this C++ enterprise stack on AWS highlighted the persistent risks associated with legacy integrations and the importance of a layered security approach. By combining secure coding practices for XML parsing with AWS-native security controls and robust monitoring, we significantly reduced the attack surface and enhanced the overall security posture of the application.