Mitigating XML External Entity (XXE) injection in old SOAP integrations in Custom C++ Implementations
Understanding the XXE Threat in Legacy C++ SOAP Services
Many organizations still rely on custom C++ implementations for critical SOAP integrations, often built years ago. These systems, while functional, can harbor significant security vulnerabilities, with XML External Entity (XXE) injection being a prime example. XXE attacks exploit the XML parser’s ability to process external entities, allowing an attacker to read sensitive files from the server, perform denial-of-service attacks, or even conduct server-side request forgery (SSRF).
The core of the problem lies in how XML parsers, particularly older or improperly configured ones, handle Document Type Definitions (DTDs). A DTD can declare external entities that the parser then resolves. If an attacker can control parts of the XML input, they can inject malicious DTD declarations that point to sensitive local files or external resources.
Identifying Vulnerable C++ XML Parsers
In C++, common XML parsing libraries include libxml2, Xerces-C++, and TinyXML. The specific configuration and usage patterns of these libraries determine their susceptibility to XXE. For instance, libxml2, a widely used library, has historically had default settings that were more permissive. Modern versions offer better controls, but legacy code might not leverage them.
A key indicator of vulnerability is the presence of code that directly parses untrusted XML input without explicitly disabling external entity resolution or DTD processing. This often manifests as calls to functions like xmlReadDoc or xmlParseDoc in libxml2, or similar functions in other libraries, without prior configuration steps.
Mitigation Strategy 1: Disabling DTDs and External Entities
The most effective way to prevent XXE is to disable the processing of DTDs and external entities entirely. For libxml2, this is achieved by setting parser options before parsing the XML document.
libxml2: Disabling DTDs and External Entities
When using libxml2, you can set parser options using xmlSetGenericErrorFunc and xmlCtxtUseOptions or by passing options directly to parsing functions like xmlReadMemory. The critical options to disable are:
XML_PARSE_NOENT: This option prevents the expansion of general entities.XML_PARSE_DTDLOAD: This option prevents the loading of external DTDs.XML_PARSE_DTDATTR: This option prevents the parsing of DTD attributes.XML_PARSE_NONET: This option prevents network access for DTDs and entities.
Here’s a C++ code snippet demonstrating how to apply these options when parsing XML using libxml2:
Example: Secure libxml2 Parsing in C++
#include <libxml/parser.h>
#include <libxml/tree.h>
#include <string>
#include <iostream>
// Function to parse XML securely
xmlDocPtr parseXmlSecure(const std::string& xmlString) {
// Set up parser options to disable DTDs and external entities
// XML_PARSE_NOENT: Disable entity expansion
// XML_PARSE_DTDLOAD: Disable DTD loading
// XML_PARSE_DTDATTR: Disable DTD attribute parsing
// XML_PARSE_NONET: Disable network access for DTDs/entities
int options = XML_PARSE_NOENT | XML_PARSE_DTDLOAD | XML_PARSE_DTDATTR | XML_PARSE_NONET;
// Create a context with the specified options
xmlParserCtxtPtr ctxt = xmlNewParserCtxt();
if (!ctxt) {
std::cerr << "Failed to create XML parser context." << std::endl;
return nullptr;
}
// Use the context with options
// Note: xmlCtxtUseOptions is deprecated in favor of passing options directly to parsing functions
// For modern libxml2, it's better to use functions that accept options directly if available,
// or ensure the context is configured correctly.
// The following is a common pattern, but check libxml2 documentation for the most current API.
// A more direct way for memory parsing with options:
xmlDocPtr doc = xmlReadMemory(xmlString.c_str(), xmlString.length(), NULL, NULL, options);
if (!doc) {
std::cerr << "Failed to parse XML string." << std::endl;
// Clean up context if it was created and not used by xmlReadMemory
if (ctxt) {
xmlFreeParserCtxt(ctxt);
}
return nullptr;
}
// Clean up context if it was created and not used by xmlReadMemory
if (ctxt) {
xmlFreeParserCtxt(ctxt);
}
return doc;
}
int main() {
// Example of a potentially malicious XML string
// This would attempt to read /etc/passwd if external entities were enabled
std::string maliciousXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
"<!DOCTYPE foo [ <!ENTITY xxe SYSTEM \"file:///etc/passwd\"> ]>\n"
"<root><element>&xxe;</element></root>";
std::cout << "Attempting to parse malicious XML..." << std::endl;
xmlDocPtr doc = parseXmlSecure(maliciousXml);
if (doc) {
std::cout << "XML parsed successfully (XXE should be mitigated)." << std::endl;
// Process the document here...
xmlFreeDoc(doc); // Free the document
} else {
std::cerr << "XML parsing failed as expected or due to other errors." << std::endl;
}
// Example of a valid XML string
std::string validXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><element>Hello</element></root>";
std::cout << "\nAttempting to parse valid XML..." << std::endl;
doc = parseXmlSecure(validXml);
if (doc) {
std::cout << "Valid XML parsed successfully." << std::endl;
xmlFreeDoc(doc); // Free the document
} else {
std::cerr << "Valid XML parsing failed." << std::endl;
}
return 0;
}
Important Note: The xmlReadMemory function directly accepts the options parameter, making it the preferred method for parsing XML from memory with specific security configurations. Always consult the libxml2 documentation for the most up-to-date API usage and available options.
Xerces-C++: Disabling DTDs and External Entities
For Xerces-C++, the approach involves configuring the parser’s features. You need to disable features related to DTDs and external entity resolution.
#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/util/XMLInitializer.hpp>
#include <xercesc/util/OutOfMemoryException.hpp>
#include <string>
#include <iostream>
// Function to parse XML securely with Xerces-C++
XMLCh* xercesStringToXMLCh(const std::string& str) {
// Convert std::string to XMLCh*
// This is a simplified conversion; a robust implementation would handle encoding properly.
return xercesc_2_8::XMLString::transcode(str.c_str());
}
void XMLChToString(const XMLCh* const toConvert, std::string& output) {
// Convert XMLCh* to std::string
char* charPtr = xercesc_2_8::XMLString::transcode(toConvert);
output = charPtr;
xercesc_2_8::XMLString::release(&charPtr);
}
int main() {
try {
// Initialize Xerces
xercesc_2_8::XMLPlatformUtils::Initialize();
// Create a XercesDOMParser object
xercesc_2_8::XercesDOMParser* parser = new xercesc_2_8::XercesDOMParser;
// Configure parser features to disable DTDs and external entities
// These are crucial for XXE prevention.
parser->setFeature(xercesc_2_8::XMLUni::fgXercesLoadExternalDTD, false); // Disable loading external DTDs
parser->setFeature(xercesc_2_8::XMLUni::fgDOMNamespaces, true); // Keep namespaces enabled if needed
parser->setFeature(xercesc_2_8::XMLUni::fgXercesUse2002008CompatXML11, false); // Example of other features
// For Xerces-3.0.0 and later, use these features:
// parser->setFeature(xercesc_2_8::XMLUni::fgSAXDisableExternalGeneralEntities, true);
// parser->setFeature(xercesc_2_8::XMLUni::fgSAXDisableExternalParameterEntities, true);
// parser->setFeature(xercesc_2_8::XMLUni::fgXercesLoadExternalDTD, false);
// Example of a potentially malicious XML string
std::string maliciousXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
"<!DOCTYPE foo [ <!ENTITY xxe SYSTEM \"file:///etc/passwd\"> ]>\n"
"<root><element>&xxe;</element></root>";
std::cout << "Attempting to parse malicious XML with Xerces..." << std::endl;
// Parse the XML string
XMLCh* xmlchString = xercesStringToXMLCh(maliciousXml);
xercesc_2_8::MemBufInputSource* memBufIS = new xercesc_2_8::MemBufInputSource(
(const XMLByte*)xmlchString,
strlen((const char*)xmlchString), // Use strlen on the transcoded string
"maliciousXML", // A name for the input source
false // Do not adopt the buffer
);
parser->parse(*memBufIS);
// If parsing succeeds without error, XXE is mitigated.
// In a real scenario, you'd check for parser errors.
std::cout << "XML parsed successfully (XXE should be mitigated)." << std::endl;
// Clean up
delete memBufIS;
xercesc_2_8::XMLString::release(&xmlchString);
delete parser;
} catch (const xercesc_2_8::XMLException& e) {
XMLCh* errorMsg = xercesc_2_8::XMLString::transcode(e.getMessage());
std::cerr << "Xerces Exception: " << errorMsg << std::endl;
xercesc_2_8::XMLString::release(&errorMsg);
// Handle error appropriately
} catch (const xercesc_2_8::OutOfMemoryException& ) {
std::cerr << "Out of memory exception!" << std::endl;
} catch (const std::exception& e) {
std::cerr << "Standard Exception: " << e.what() << std::endl;
} catch (...) {
std::cerr << "Unknown exception!" << std::endl;
}
// Terminate Xerces
xercesc_2_8::XMLPlatformUtils::Terminate();
return 0;
}
Note on Xerces Features: The exact feature names might vary slightly between Xerces-C++ versions. Always refer to the specific version’s documentation. The key is to disable features that allow external DTDs and entity resolution. For Xerces 3.0.0 and later, features like fgSAXDisableExternalGeneralEntities and fgSAXDisableExternalParameterEntities are more explicit for XXE prevention.
Mitigation Strategy 2: Input Validation and Sanitization
While disabling external entities is the primary defense, robust input validation and sanitization serve as a secondary layer of defense. This involves:
- Whitelisting Allowed Elements and Attributes: Only permit known, expected XML structures. Reject any XML that deviates from the defined schema.
- Sanitizing User-Controlled Data: If parts of the XML are generated or influenced by user input, ensure that characters or sequences that could be interpreted as XML markup (like ‘<‘, ‘>’, ‘&’, etc.) are properly escaped or removed.
- Limiting XML Depth and Size: Prevent denial-of-service attacks by setting limits on the recursion depth of the XML document and the total size of the parsed XML.
Implementing strict schema validation (e.g., using XSD) before parsing can also be highly effective. If the incoming XML does not conform to the expected schema, it should be rejected immediately, preventing potentially malicious structures from even reaching the parser.
Mitigation Strategy 3: Using a Secure XML Parser Configuration
If upgrading or modifying the C++ code to explicitly disable features is not immediately feasible, consider the environment in which the parser is running. Some libraries might have configuration files or environment variables that can influence their behavior. However, relying on external configuration for security-critical features is generally discouraged; direct code-level configuration is preferred.
Testing and Verification
After implementing mitigation strategies, thorough testing is crucial. This involves:
- Penetration Testing: Actively attempt to exploit XXE vulnerabilities using known attack vectors. Tools like
xmltoxxeor custom scripts can be used. - Code Review: Manually inspect the XML parsing code to ensure that security options are correctly applied and that no unhandled parsing paths exist.
- Fuzzing: Use fuzzing tools to generate a wide range of malformed or unexpected XML inputs to uncover potential parser vulnerabilities or unexpected behavior.
A simple test case for XXE involves sending an XML payload that attempts to reference a local file (e.g., file:///etc/passwd) or an external URL. If the application responds with the content of the file or makes an external request, the XXE vulnerability is present and the mitigation has failed.
Conclusion
Securing legacy C++ SOAP integrations against XXE requires a proactive approach. By understanding the risks associated with XML parsing and implementing robust mitigation strategies—primarily by disabling DTDs and external entity resolution at the parser level—you can significantly reduce the attack surface. Supplementing these measures with strict input validation and regular security testing provides a comprehensive defense against this common and dangerous vulnerability.