How We Audited a High-Traffic PHP Enterprise Stack on Google Cloud and Mitigated XML External Entity (XXE) injection in old SOAP integrations

Auditing a High-Traffic PHP Enterprise Stack on Google Cloud

Our recent engagement involved a critical audit of a high-traffic PHP enterprise application deployed on Google Cloud Platform (GCP). The primary objective was to identify and mitigate security vulnerabilities, with a specific focus on XML External Entity (XXE) injection risks within legacy SOAP integrations. This stack, serving millions of requests daily, presented unique challenges due to its scale, complexity, and the presence of older, less actively maintained components.

Initial Assessment and Tooling

The initial phase involved a comprehensive reconnaissance of the application’s architecture and its interaction points. We leveraged a combination of automated scanning tools and manual inspection. For infrastructure analysis on GCP, we focused on:

GCP Security Command Center (SCC): For an overview of existing findings, asset inventory, and compliance posture.
Cloud Audit Logs: To trace API calls, identify anomalous access patterns, and understand resource provisioning.
Network Topology Analysis: Examining VPC configurations, firewall rules (using gcloud compute firewall-rules list), and load balancer setups (e.g., Google Cloud Load Balancing).

For the PHP application layer, our toolkit included:

Static Application Security Testing (SAST): Tools like PHPStan with security extensions, and custom regex-based scripts to flag potentially dangerous functions (e.g., libxml_disable_entity_loader, simplexml_load_string, DOMDocument->loadXML).
Dynamic Application Security Testing (DAST): OWASP ZAP and Burp Suite were configured to proxy traffic, fuzz inputs, and specifically target SOAP endpoints.
Dependency Scanning: Tools like Composer’s built-in security advisories and Snyk to identify vulnerable third-party libraries.

Identifying XXE Vulnerabilities in SOAP Integrations

The most significant findings revolved around the SOAP integrations, which were primarily responsible for inter-service communication and data exchange with external partners. Many of these integrations utilized older PHP libraries and custom XML parsing logic that did not adequately sanitize or disable external entity processing.

A common pattern we observed was the use of simplexml_load_string or DOMDocument->loadXML without proper configuration. An attacker could craft a malicious XML payload to include external entities, forcing the server to fetch and process arbitrary content from external or internal network resources. This could lead to:

Information disclosure (e.g., reading sensitive files like /etc/passwd or internal metadata endpoints).
Server-Side Request Forgery (SSRF) by making the server initiate requests to internal or external systems.
Denial of Service (DoS) through recursive entity expansion (billion laughs attack).

Exploitation Scenario: Reading Local Files

Consider a hypothetical SOAP endpoint that processes user-submitted XML data for profile updates. A vulnerable implementation might look like this:

Vulnerable PHP Code Snippet

The following PHP code demonstrates a typical vulnerable pattern:

<?php
// Assume $xml_payload is received from a SOAP request
$xml_payload = '<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd" > ]>
<user>
    <id>123</id>
    <name>&xxe;</name>
</user>';

// Vulnerable parsing
$xml = simplexml_load_string($xml_payload);

if ($xml === false) {
    echo "Failed loading XML\n";
    foreach(libxml_get_errors() as $error) {
        echo "\t", $error->message, "\n";
    }
} else {
    // Process user data (e.g., $xml->name)
    echo "User name: " . $xml->name . "\n";
}
?>

In this snippet, the <!ENTITY xxe SYSTEM "file:///etc/passwd" > declaration within the DOCTYPE section defines an external entity named xxe that points to the local file /etc/passwd. When simplexml_load_string processes this payload, it attempts to resolve the entity, embedding the file’s content into the $xml->name variable. This would then be outputted, revealing sensitive system information.

Mitigation Strategy: Disabling External Entity Loading

The most effective and direct mitigation for XXE vulnerabilities in PHP’s XML processing is to disable the loading of external entities. This can be achieved by setting the LIBXML_NOENT option when loading XML, or more granularly by disabling the entity loader globally or per-document.

PHP 8+ Recommended Approach (using DOMDocument)

For modern PHP versions, using DOMDocument with explicit security options is preferred. The key is to disable DTD loading and external entity processing.

<?php
// Assume $xml_payload is received from a SOAP request
$xml_payload = '<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd" > ]>
<user>
    <id>123</id>
    <name>&xxe;</name>
</user>';

$dom = new DOMDocument();
// Disable DTD loading and external entity loading
$dom->resolveExternals = false; // This is the primary setting to disable external entities
$dom->loadXML($xml_payload);

// Check for errors
$errors = $dom->getErrors();
if (!empty($errors)) {
    foreach ($errors as $error) {
        // Log or handle errors appropriately
        error_log("XML Parsing Error: " . $error->message);
    }
    // Potentially throw an exception or return an error response
    die("Invalid XML structure.");
}

// If no errors, proceed to process the XML safely
$xpath = new DOMXPath($dom);
$userNameNode = $xpath->query('//user/name')->item(0);

if ($userNameNode) {
    echo "User name: " . $userNameNode->nodeValue . "\n";
} else {
    echo "User name not found.\n";
}
?>

The crucial lines here are:

$dom->resolveExternals = false;: This explicitly tells DOMDocument not to resolve external entities.
$dom->loadXML($xml_payload);: Even if the payload contains DTDs, they won’t be processed to fetch external resources.

Global Mitigation (PHP Configuration)

For a more systemic approach, especially when dealing with numerous legacy integrations that are difficult to refactor immediately, disabling the libxml external entity loader globally is a strong defense-in-depth measure. This is done via the php.ini configuration file.

[PHP]
; Disable the external entity loader for all libxml functions
libxml_disable_entity_loader(true);

This directive should be placed in your main php.ini file or a dedicated configuration file loaded by PHP (e.g., in /etc/php/X.Y/cli/conf.d/ or /etc/php/X.Y/fpm/conf.d/). After modifying php.ini, the PHP-FPM or web server process must be restarted.

Important Note: While libxml_disable_entity_loader(true); is effective, it’s considered a legacy approach. The recommended practice is to configure individual XML parsers as shown with DOMDocument. However, for rapid mitigation across a large, complex system, the global setting can be a lifesaver.

SSRF Mitigation via XML

XXE can also be leveraged for Server-Side Request Forgery (SSRF). An attacker could craft an XML payload that references an internal IP address or a metadata service endpoint on GCP.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token" > ]>
<data>
    <value>&xxe;</value>
</data>

If the PHP application is running on a GCP Compute Engine instance, this payload would attempt to fetch the instance’s service account token from the metadata server. This token could then be used to access other GCP services with the instance’s permissions. The same mitigation techniques (disabling external entity loading) apply here.

GCP-Specific Hardening and Monitoring

Beyond application-level fixes, we reinforced the GCP environment:

VPC Service Controls: Implemented perimeter security to restrict access to sensitive GCP services (like Cloud Storage, BigQuery) from unauthorized networks or projects. This would prevent an SSRF attack from exfiltrating data even if the XXE vulnerability was exploited.
Firewall Rules: Ensured that egress firewall rules on GCP only allowed necessary outbound connections. This limits the potential reach of an SSRF attack.
IAM Policies: Reviewed and tightened Identity and Access Management roles to follow the principle of least privilege. Service accounts used by the PHP application should have minimal permissions.
Cloud Audit Logs & Monitoring: Configured custom log-based metrics and alerts in Cloud Monitoring to detect suspicious patterns, such as unexpected outbound network requests from application instances or attempts to access the metadata service from unexpected sources.

Deployment and Verification

The mitigation steps were deployed incrementally. For each SOAP integration, we:

Applied the code-level fixes (e.g., using DOMDocument with security options).
Performed regression testing to ensure existing functionality remained intact.
Re-ran DAST scans specifically targeting the modified endpoints to confirm the XXE vulnerability was no longer exploitable.
Monitored application logs and GCP Security Command Center for any new findings or anomalies.

For the global php.ini change, a phased rollout was executed across different environments (staging, then production) with extensive monitoring during each phase. Verification involved attempting the previously successful XXE exploit payloads against the updated application.

Conclusion

Auditing and securing a high-traffic enterprise PHP stack on GCP requires a multi-layered approach. XXE injection, particularly in legacy SOAP integrations, remains a critical threat. By combining granular code-level fixes with robust GCP infrastructure security and continuous monitoring, we were able to effectively mitigate these risks, significantly enhancing the overall security posture of the application.