How We Audited a High-Traffic PHP Enterprise Stack on OVH and Mitigated XML External Entity (XXE) injection in old SOAP integrations

Initial Assessment: Identifying the Attack Surface

Our engagement began with a deep dive into a high-traffic PHP enterprise stack hosted on OVH. The primary concern was a potential vulnerability stemming from legacy SOAP integrations, a common vector for XML External Entity (XXE) injection. The stack comprised several monolithic PHP applications, a cluster of MySQL databases, and a load-balanced Nginx front-end. The sheer volume of traffic, coupled with the age of some SOAP services, presented a significant risk.

The initial reconnaissance phase focused on enumerating all SOAP endpoints exposed by the applications. We leveraged tools like `curl` and custom PHP scripts to probe for WSDL files and identify the available operations. Understanding the exact XML structures accepted by these endpoints was paramount.

Probing for XXE Vulnerabilities

XXE vulnerabilities arise when an XML parser processes untrusted XML input and is configured to allow external entity expansion. This can lead to information disclosure (reading local files), denial-of-service attacks, or server-side request forgery (SSRF).

We developed a series of payloads designed to trigger XXE. The core of these payloads involves defining an external entity that references a local file or an internal network resource, and then referencing that entity within the main XML structure. For example, a common technique is to try and read the `/etc/passwd` file.

Example XXE Payload for File Disclosure

Consider a SOAP endpoint that accepts an XML request for user data. A vulnerable parser might process the following malicious request:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd" > ]>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <getUserData>
      <userId>&xxe;</userId>
    </getUserData>
  </soap:Body>
</soap:Envelope>

If the SOAP service’s XML parser is vulnerable and not configured securely, the content of `/etc/passwd` would be returned within the SOAP response, potentially embedded in an error message or as part of the expected data structure. We automated this probing across all identified SOAP endpoints using a Python script that iterated through a list of common sensitive file paths and internal network targets.

Deep Dive: PHP XML Parsers and Their Pitfalls

PHP offers several ways to parse XML. The most common are SimpleXML, DOMDocument, and XMLReader. Historically, the default configurations of these extensions were permissive regarding external entities.

For instance, using DOMDocument::loadXML() or DOMDocument::load() without proper configuration could lead to XXE. Similarly, SimpleXML’s underlying parser could be susceptible.

Vulnerable PHP Code Snippet (Illustrative)

<?php
// This is a simplified, VULNERABLE example
$xml_string = $_POST['xml_data']; // Untrusted input

$dom = new DOMDocument();
$dom->loadXML($xml_string); // Vulnerable call if not configured

// ... process $dom ...
?>

The critical realization was that many of these integrations were built years ago, likely using default PHP versions and configurations that did not prioritize XML security. The OVH environment, while robust, did not inherently mitigate application-level vulnerabilities like XXE.

Mitigation Strategy: Securing XML Parsers

The primary mitigation involves configuring the XML parsers to disable external entity resolution. This needs to be applied consistently across all PHP code that handles XML input, especially from untrusted sources like SOAP requests.

Securing DOMDocument

For DOMDocument, the key is to disable the loading of external entities and DTDs (Document Type Definitions) before parsing.

<?php
$xml_string = $_POST['xml_data'];

$dom = new DOMDocument();

$dom->loadXML($xml_string, LIBXML_NOENT | LIBXML_NONET); // Key mitigation steps

// Now it's safer to process $dom
?>

The flags used are crucial:

LIBXML_NOENT: Disables the expansion of general entities.
LIBXML_NONET: Disables the network access, preventing external entity resolution via URLs.

Securing SimpleXML

SimpleXML relies on libxml as well. While it doesn’t expose the same flags directly in its constructor, you can set them globally or use DOMDocument as an intermediary.

<?php
$xml_string = $_POST['xml_data'];

// Option 1: Use DOMDocument with flags, then convert to SimpleXML
$dom = new DOMDocument();
$dom->loadXML($xml_string, LIBXML_NOENT | LIBXML_NONET);
$simpleXml = simplexml_import_dom($dom);

// Option 2: Set global libxml options (use with caution, can affect other parts of the app)
// libxml_disable_entity_loader(true); // Deprecated in PHP 8.0, removed in PHP 8.1
// For PHP < 8.0, libxml_disable_entity_loader(true); is the way.
// For PHP >= 8.0, the default behavior is safer, but explicit DOMDocument is preferred.

// If using PHP >= 8.0 and relying on SimpleXML directly, ensure your libxml version is recent
// and that the default configuration is secure. Explicit DOMDocument is always best.
$simpleXml = simplexml_load_string($xml_string); // Potentially vulnerable if libxml defaults are not secure

?>

Given the enterprise context and the need for robust security, we mandated the use of the DOMDocument approach with explicit flags, even when the final processing was intended for SimpleXML. This provided a clear, auditable, and secure way to handle XML parsing.

Implementation and Verification

The mitigation involved a code audit of all SOAP integration points. We identified the specific PHP files and functions responsible for parsing incoming SOAP requests. For each instance, we refactored the code to incorporate the secure parsing methods described above.

Post-implementation, we re-ran our suite of XXE probing scripts. This time, instead of receiving sensitive file contents or internal network responses, the requests were either rejected with generic parsing errors or returned empty, indicating that the external entity resolution was successfully blocked.

Automated Testing with PHPUnit

To ensure ongoing protection and prevent regressions, we integrated security tests into the CI/CD pipeline using PHPUnit. These tests specifically target the SOAP endpoints with known XXE payloads.

<?php
// tests/SoapXXETest.php
namespace App\Tests;

use PHPUnit\Framework\TestCase;

class SoapXXETest extends TestCase
{
    public function testXXEProtectionOnUserDataEndpoint()
    {
        $malicious_xml = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd" > ]><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><getUserData><userId>&xxe;</userId></getUserData></soap:Body></soap:Envelope>';

        // Assuming your SOAP endpoint is accessible via a POST request to a specific URL
        // In a real scenario, you'd use Guzzle or similar HTTP client
        // For simplicity, we'll simulate the server-side parsing logic here

        $dom = new DOMDocument();
        // Assert that loading this XML with security flags does NOT load external entities
        // The actual behavior might be an exception or an empty DOM depending on libxml version and exact setup
        // A robust test would check for the absence of the entity's content in the parsed structure

        // Example: If the parser is secure, the entity '&xxe;' should not be resolved.
        // The resulting DOM might be malformed or lack the expected data.
        // A more direct test is to try and load it and expect it to fail or not resolve the entity.

        // This assertion is conceptual. The exact check depends on how your application handles parsing errors.
        // A common outcome of XXE attempts on secure parsers is that the entity is treated as literal text or causes a parsing error.
        // If the entity is treated as literal text:
        // $dom->loadXML($malicious_xml, LIBXML_NOENT | LIBXML_NONET);
        // $userIdNode = $dom->getElementsByTagName('userId')->item(0);
        // $this->assertNotContains('<?xml', $userIdNode->nodeValue); // Check if it's not the raw entity definition
        // $this->assertStringContainsString('etc/passwd', $userIdNode->nodeValue); // This would FAIL if secure

        // A better approach is to check for parsing errors or the absence of the entity resolution
        // For demonstration, we'll simulate a successful secure parse where the entity is NOT resolved
        // and the content is NOT leaked.

        // In a real test, you'd send the request to your actual endpoint and check the response.
        // For this example, we'll assume the parsing logic is encapsulated and testable.

        // Simulate the secure parsing function
        $parsed_data = $this->parseXmlSecurely($malicious_xml);

        // Assert that the sensitive file content is NOT present in the parsed data
        // This assertion depends heavily on the structure of $parsed_data
        // If $parsed_data is an array or object representing the user data:
        // $this->assertArrayNotHasKey('etc/passwd', $parsed_data);
        // $this->assertStringNotContainsString('root:x:0', $parsed_data); // Example check

        // A simpler assertion: if the parser is secure, the entity will not be resolved,
        // and the userId will likely be the literal '&xxe;' or an empty string, not file content.
        // This test assumes $parsed_data['userId'] would hold the resolved value.
        // If the parser is secure, the entity is NOT resolved. The value might be literal '&xxe;' or empty.
        // We assert that it's NOT the content of /etc/passwd.
        $this->assertFalse(strpos($parsed_data['userId'] ?? '', 'root:x:0') !== false);
    }

    /**
     * Simulates secure XML parsing.
     * In a real app, this would be your actual SOAP request handler.
     */
    private function parseXmlSecurely(string $xml_string): array
    {
        $dom = new DOMDocument();
        $dom->loadXML($xml_string, LIBXML_NOENT | LIBXML_NONET);

        $xpath = new DOMXPath($dom);
        $xpath->registerNamespace('soap', 'http://schemas.xmlsoap.org/soap/envelope/');
        $xpath->registerNamespace('ns', 'http://example.com/schemas'); // Example namespace

        $userIdNode = $xpath->query('//soap:Envelope/soap:Body/ns:getUserData/ns:userId')->item(0);

        $result = [];
        if ($userIdNode) {
            $result['userId'] = $userIdNode->nodeValue;
        }
        return $result;
    }
}
?>

This automated testing ensures that any future code changes or misconfigurations that reintroduce XXE vulnerabilities would be caught before deployment, maintaining the integrity of the system.

Conclusion and Ongoing Vigilance

Auditing and securing a high-traffic enterprise PHP stack on OVH against XXE injection in legacy SOAP integrations required a multi-faceted approach: thorough enumeration of the attack surface, precise payload crafting for vulnerability detection, deep understanding of PHP’s XML parsing mechanisms, and meticulous implementation of secure configurations. The key takeaway is that security is not a one-time fix but an ongoing process. Regular audits, automated testing, and staying abreast of security best practices for all components, including underlying libraries and language versions, are essential for protecting critical infrastructure.