How We Audited a High-Traffic C++ Enterprise Stack on OVH and Mitigated XML External Entity (XXE) injection in old SOAP integrations
Initial Assessment: Identifying the Attack Surface
Our engagement began with a deep dive into a high-traffic C++ enterprise stack hosted on OVH. The primary concern was a potential XML External Entity (XXE) injection vulnerability, specifically within legacy SOAP integrations. These integrations, often developed years prior and maintained by different teams, represented a significant attack surface. The initial phase involved cataloging all SOAP endpoints, understanding their request/response schemas, and identifying the underlying XML parsing libraries used within the C++ applications. Many older C++ XML parsers, by default, are configured to resolve external entities, making them susceptible to XXE attacks. This can lead to information disclosure (reading local files), Server-Side Request Forgery (SSRF), and denial-of-service (DoS) attacks.
Deep Dive into C++ XML Parsers and XXE Vulnerabilities
The core of the vulnerability lies in how XML parsers handle DTDs (Document Type Definitions) and external entities. An attacker can craft a malicious XML payload that includes a DTD referencing an external resource. When the vulnerable parser processes this XML, it will attempt to fetch and process the external entity, potentially revealing sensitive information or executing unintended actions.
Consider a common scenario using `libxml2`, a widely adopted C++ XML parsing library. By default, `libxml2` might be configured to resolve external entities. A simplified, vulnerable C++ code snippet might look like this:
#include <libxml/parser.h>
#include <libxml/tree.h>
// ... other includes and setup ...
xmlDocPtr parseXmlString(const char* xmlString) {
xmlDocPtr doc = xmlReadMemory(xmlString, strlen(xmlString), "noname.xml", NULL, 0);
if (doc == NULL) {
// Handle parsing error
return NULL;
}
// Vulnerable: external entities are resolved by default
return doc;
}
int main() {
const char* maliciousXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
"<!DOCTYPE foo [ <!ENTITY xxe SYSTEM \"file:///etc/passwd\"> ]>\n"
"<root>&xxe;</root>";
xmlDocPtr document = parseXmlString(maliciousXml);
if (document) {
// Process document...
xmlFreeDoc(document);
}
xmlCleanupParser();
return 0;
}
In this example, the `xmlReadMemory` function, when used with default settings, will attempt to fetch the content of `/etc/passwd` due to the `&xxe;` entity definition. This is a classic XXE leading to local file disclosure.
Mitigation Strategy: Disabling External Entity Resolution
The most effective mitigation is to explicitly disable the resolution of external entities at the parser configuration level. For `libxml2`, this is achieved by setting parser options before calling the parsing function.
The corrected C++ code snippet using `libxml2` would involve setting the `XML_PARSE_NOENT` option (which disables entity substitution, including external ones) or more granularly, disabling DTD loading and external entity resolution.
#include <libxml/parser.h>
#include <libxml/tree.h>
#include <libxml/xmlschemas.h> // For schema validation if applicable
// ... other includes and setup ...
xmlDocPtr parseXmlStringSecurely(const char* xmlString) {
// Create a parser context
xmlParserCtxtPtr ctxt = xmlNewParserCtxt();
if (!ctxt) {
// Handle context creation error
return NULL;
}
// Set parser options to disable external entity resolution
// XML_PARSE_NOENT: Disable substitution of entities.
// XML_PARSE_NONET: Disable network access (for SSRF prevention).
// XML_PARSE_NODTD: Disable DTD loading.
ctxt->options |= XML_PARSE_NOENT | XML_PARSE_NONET | XML_PARSE_NODTD;
xmlDocPtr doc = xmlCtxtReadMemory(ctxt, xmlString, strlen(xmlString), "noname.xml", NULL, 0);
if (!doc) {
// Handle parsing error, check ctxt->lastError for details
xmlFreeParserCtxt(ctxt);
return NULL;
}
xmlFreeParserCtxt(ctxt);
return doc;
}
int main() {
const char* maliciousXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
"<!DOCTYPE foo [ <!ENTITY xxe SYSTEM \"file:///etc/passwd\"> ]>\n"
"<root>&xxe;</root>";
xmlDocPtr document = parseXmlStringSecurely(maliciousXml);
if (document) {
// Process document...
xmlFreeDoc(document);
}
xmlCleanupParser();
return 0;
}
By setting `ctxt->options |= XML_PARSE_NOENT | XML_PARSE_NONET | XML_PARSE_NODTD;`, we instruct `libxml2` to ignore any DTD declarations and not to resolve external entities or access network resources. This effectively neutralizes XXE and SSRF vectors originating from XML parsing.
Implementation Across Diverse C++ Libraries
Our audit revealed that different teams had adopted various XML parsing libraries over time, including Xerces-C++, TinyXML, and Boost.PropertyTree (which can parse XML). Each library requires a specific configuration to disable external entity resolution.
Xerces-C++:
#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/util/XMLInitializer.hpp>
#include <xercesc/sax/SAXParseException.hpp>
// ...
try {
xercesc::XMLPlatformUtils::Initialize();
xercesc::XercesDOMParser* parser = new xercesc::XercesDOMParser;
// Disable external entity resolution
parser->setDoExternalGeneralEntities(false);
parser->setDoXInclude(false); // XInclude is another form of entity expansion
parser->setLoadExternalDTD(false);
// ... parse XML ...
delete parser;
xercesc::XMLPlatformUtils::Terminate();
} catch (const xercesc::Exception& e) {
// Handle exceptions
}
TinyXML-2:
TinyXML-2 does not support DTDs or external entities by design, making it inherently safer in this regard. If TinyXML-2 was in use, the risk was lower for XXE specifically through its parsing mechanism. However, it’s crucial to verify that no custom extensions or workarounds were implemented that could reintroduce this vulnerability.
Boost.PropertyTree:
When using Boost.PropertyTree for XML parsing, the underlying parser configuration needs to be managed. If it uses `libxml2` or Xerces-C++ internally, the same principles apply. For example, when using `boost::property_tree::xml_parser::read_xml` with a custom parser, you would configure the underlying parser.
Configuration Management and Deployment on OVH
The enterprise stack was deployed across numerous OVH virtual machines and bare-metal servers. A critical part of the mitigation was ensuring consistent configuration across all environments. This involved:
- Centralized Configuration Management: Leveraging tools like Ansible or Chef to push updated C++ application configurations and build flags. This ensured that the secure parsing options were applied uniformly.
- Build System Integration: Modifying the build system (e.g., CMake, Makefiles) to ensure that the necessary compiler flags or library configurations for secure parsing were enabled by default for all XML parsing modules.
- Runtime Checks: Implementing runtime checks or assertions within the C++ applications to verify that the XML parser options were set correctly. This provided an additional layer of defense against misconfigurations.
- OVH Network Security: While not directly mitigating XXE within the application, reviewing OVH firewall rules and security groups to restrict outbound network access from application servers. This limits the potential impact of SSRF attacks if an XXE vulnerability were to be exploited in a different context. For instance, restricting outbound traffic to only necessary ports and destinations.
Testing and Validation
Post-mitigation, rigorous testing was essential. This included:
- Fuzzing: Employing fuzzing techniques with specially crafted XML payloads designed to trigger XXE vulnerabilities. Tools like `xml-fuzzer` or custom Python scripts using libraries like `lxml` (for generating malformed XML) were used.
- Penetration Testing: Conducting targeted penetration tests against the SOAP endpoints, attempting to exploit XXE by requesting sensitive files (e.g., `/etc/passwd`, application configuration files) or attempting SSRF by pointing to internal OVH services or external malicious servers.
- Code Review: Performing targeted code reviews of all modules that handle XML parsing to ensure that the secure configurations were correctly implemented and that no new vulnerabilities were introduced.
- Log Analysis: Monitoring application and server logs for any unusual network activity or parsing errors that might indicate attempted exploitation.
Conclusion: A Proactive Security Posture
Auditing and securing a high-traffic C++ enterprise stack on a platform like OVH requires a multi-faceted approach. For XXE vulnerabilities in legacy SOAP integrations, the key was to identify the specific XML parsing libraries in use and systematically apply secure configurations to disable external entity resolution. This, combined with robust configuration management and thorough testing, allowed us to significantly reduce the risk of exploitation. The proactive identification and remediation of such vulnerabilities are paramount in maintaining the security and integrity of enterprise systems.