How We Audited a High-Traffic C++ Enterprise Stack on OVH and Mitigated XML External Entity (XXE) injection in old SOAP integrations

Initial Assessment: Identifying the Attack Surface

Our engagement began with a deep dive into a high-traffic C++ enterprise stack hosted on OVH. The primary concern was a potential XML External Entity (XXE) injection vulnerability, specifically within legacy SOAP integrations. These integrations, often developed years prior and maintained by different teams, represented a significant attack surface. The initial phase involved cataloging all SOAP endpoints, understanding their request/response schemas, and identifying the underlying XML parsing libraries used within the C++ applications. Many older C++ XML parsers, by default, are configured to resolve external entities, making them susceptible to XXE attacks. This can lead to information disclosure (reading local files), Server-Side Request Forgery (SSRF), and denial-of-service (DoS) attacks.

Deep Dive into C++ XML Parsers and XXE Vulnerabilities

The core of the vulnerability lies in how XML parsers handle DTDs (Document Type Definitions) and external entities. An attacker can craft a malicious XML payload that includes a DTD referencing an external resource. When the vulnerable parser processes this XML, it will attempt to fetch and process the external entity, potentially revealing sensitive information or executing unintended actions.

Consider a common scenario using `libxml2`, a widely adopted C++ XML parsing library. By default, `libxml2` might be configured to resolve external entities. A simplified, vulnerable C++ code snippet might look like this:

#include <libxml/parser.h>
#include <libxml/tree.h>

// ... other includes and setup ...

xmlDocPtr parseXmlString(const char* xmlString) {
    xmlDocPtr doc = xmlReadMemory(xmlString, strlen(xmlString), "noname.xml", NULL, 0);
    if (doc == NULL) {
        // Handle parsing error
        return NULL;
    }
    // Vulnerable: external entities are resolved by default
    return doc;
}

int main() {
    const char* maliciousXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
                               "<!DOCTYPE foo [ <!ENTITY xxe SYSTEM \"file:///etc/passwd\"> ]>\n"
                               "<root>&xxe;</root>";
    xmlDocPtr document = parseXmlString(maliciousXml);
    if (document) {
        // Process document...
        xmlFreeDoc(document);
    }
    xmlCleanupParser();
    return 0;
}

In this example, the `xmlReadMemory` function, when used with default settings, will attempt to fetch the content of `/etc/passwd` due to the `&xxe;` entity definition. This is a classic XXE leading to local file disclosure.

Mitigation Strategy: Disabling External Entity Resolution

The most effective mitigation is to explicitly disable the resolution of external entities at the parser configuration level. For `libxml2`, this is achieved by setting parser options before calling the parsing function.

The corrected C++ code snippet using `libxml2` would involve setting the `XML_PARSE_NOENT` option (which disables entity substitution, including external ones) or more granularly, disabling DTD loading and external entity resolution.

#include <libxml/parser.h>
#include <libxml/tree.h>
#include <libxml/xmlschemas.h> // For schema validation if applicable

// ... other includes and setup ...

xmlDocPtr parseXmlStringSecurely(const char* xmlString) {
    // Create a parser context
    xmlParserCtxtPtr ctxt = xmlNewParserCtxt();
    if (!ctxt) {
        // Handle context creation error
        return NULL;
    }

    // Set parser options to disable external entity resolution
    // XML_PARSE_NOENT: Disable substitution of entities.
    // XML_PARSE_NONET: Disable network access (for SSRF prevention).
    // XML_PARSE_NODTD: Disable DTD loading.
    ctxt->options |= XML_PARSE_NOENT | XML_PARSE_NONET | XML_PARSE_NODTD;

    xmlDocPtr doc = xmlCtxtReadMemory(ctxt, xmlString, strlen(xmlString), "noname.xml", NULL, 0);

    if (!doc) {
        // Handle parsing error, check ctxt->lastError for details
        xmlFreeParserCtxt(ctxt);
        return NULL;
    }

    xmlFreeParserCtxt(ctxt);
    return doc;
}

int main() {
    const char* maliciousXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
                               "<!DOCTYPE foo [ <!ENTITY xxe SYSTEM \"file:///etc/passwd\"> ]>\n"
                               "<root>&xxe;</root>";
    xmlDocPtr document = parseXmlStringSecurely(maliciousXml);
    if (document) {
        // Process document...
        xmlFreeDoc(document);
    }
    xmlCleanupParser();
    return 0;
}

By setting `ctxt->options |= XML_PARSE_NOENT | XML_PARSE_NONET | XML_PARSE_NODTD;`, we instruct `libxml2` to ignore any DTD declarations and not to resolve external entities or access network resources. This effectively neutralizes XXE and SSRF vectors originating from XML parsing.

Implementation Across Diverse C++ Libraries

Our audit revealed that different teams had adopted various XML parsing libraries over time, including Xerces-C++, TinyXML, and Boost.PropertyTree (which can parse XML). Each library requires a specific configuration to disable external entity resolution.

Xerces-C++:

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/util/XMLInitializer.hpp>
#include <xercesc/sax/SAXParseException.hpp>

// ...

try {
    xercesc::XMLPlatformUtils::Initialize();

    xercesc::XercesDOMParser* parser = new xercesc::XercesDOMParser;

    // Disable external entity resolution
    parser->setDoExternalGeneralEntities(false);
    parser->setDoXInclude(false); // XInclude is another form of entity expansion
    parser->setLoadExternalDTD(false);

    // ... parse XML ...

    delete parser;
    xercesc::XMLPlatformUtils::Terminate();
} catch (const xercesc::Exception& e) {
    // Handle exceptions
}

TinyXML-2:

TinyXML-2 does not support DTDs or external entities by design, making it inherently safer in this regard. If TinyXML-2 was in use, the risk was lower for XXE specifically through its parsing mechanism. However, it’s crucial to verify that no custom extensions or workarounds were implemented that could reintroduce this vulnerability.

Boost.PropertyTree:

When using Boost.PropertyTree for XML parsing, the underlying parser configuration needs to be managed. If it uses `libxml2` or Xerces-C++ internally, the same principles apply. For example, when using `boost::property_tree::xml_parser::read_xml` with a custom parser, you would configure the underlying parser.

Configuration Management and Deployment on OVH

The enterprise stack was deployed across numerous OVH virtual machines and bare-metal servers. A critical part of the mitigation was ensuring consistent configuration across all environments. This involved:

Centralized Configuration Management: Leveraging tools like Ansible or Chef to push updated C++ application configurations and build flags. This ensured that the secure parsing options were applied uniformly.
Build System Integration: Modifying the build system (e.g., CMake, Makefiles) to ensure that the necessary compiler flags or library configurations for secure parsing were enabled by default for all XML parsing modules.
Runtime Checks: Implementing runtime checks or assertions within the C++ applications to verify that the XML parser options were set correctly. This provided an additional layer of defense against misconfigurations.
OVH Network Security: While not directly mitigating XXE within the application, reviewing OVH firewall rules and security groups to restrict outbound network access from application servers. This limits the potential impact of SSRF attacks if an XXE vulnerability were to be exploited in a different context. For instance, restricting outbound traffic to only necessary ports and destinations.

Testing and Validation

Post-mitigation, rigorous testing was essential. This included:

Fuzzing: Employing fuzzing techniques with specially crafted XML payloads designed to trigger XXE vulnerabilities. Tools like `xml-fuzzer` or custom Python scripts using libraries like `lxml` (for generating malformed XML) were used.
Penetration Testing: Conducting targeted penetration tests against the SOAP endpoints, attempting to exploit XXE by requesting sensitive files (e.g., `/etc/passwd`, application configuration files) or attempting SSRF by pointing to internal OVH services or external malicious servers.
Code Review: Performing targeted code reviews of all modules that handle XML parsing to ensure that the secure configurations were correctly implemented and that no new vulnerabilities were introduced.
Log Analysis: Monitoring application and server logs for any unusual network activity or parsing errors that might indicate attempted exploitation.

Conclusion: A Proactive Security Posture

Auditing and securing a high-traffic C++ enterprise stack on a platform like OVH requires a multi-faceted approach. For XXE vulnerabilities in legacy SOAP integrations, the key was to identify the specific XML parsing libraries in use and systematically apply secure configurations to disable external entity resolution. This, combined with robust configuration management and thorough testing, allowed us to significantly reduce the risk of exploitation. The proactive identification and remediation of such vulnerabilities are paramount in maintaining the security and integrity of enterprise systems.

How We Audited a High-Traffic C++ Enterprise Stack on OVH and Mitigated XML External Entity (XXE) injection in old SOAP integrations

Initial Assessment: Identifying the Attack Surface

Deep Dive into C++ XML Parsers and XXE Vulnerabilities

Mitigation Strategy: Disabling External Entity Resolution

Implementation Across Diverse C++ Libraries

Configuration Management and Deployment on OVH

Testing and Validation

Conclusion: A Proactive Security Posture

Recent Posts

Top Categories

Our Products

Our Services