How We Audited a High-Traffic Perl Enterprise Stack on Google Cloud and Mitigated XML External Entity (XXE) injection in old SOAP integrations
Initial Assessment: Identifying the Attack Surface
Our engagement began with a deep dive into a high-traffic enterprise Perl stack hosted on Google Cloud Platform (GCP). The primary concern was a potential XML External Entity (XXE) injection vulnerability, specifically within legacy SOAP integrations. These integrations, often developed years prior and maintained by different teams, represented a significant blind spot. The initial phase involved cataloging all SOAP endpoints, understanding their data flow, and identifying the XML parsing libraries in use. Many of these integrations relied on older, less secure versions of Perl’s XML parsing modules, such as `XML::LibXML` or `XML::Parser`, which, by default, might not have had XXE protections enabled.
The first step was to enumerate all active SOAP services. This was achieved by inspecting Nginx access logs for requests matching common SOAP patterns (e.g., `SOAPAction` header, `Content-Type: text/xml` or `application/soap+xml`). We also reviewed the application codebase, looking for modules that handled incoming XML payloads.
Deep Dive into XML Parsing Libraries and Configurations
The core of the XXE vulnerability lies in how XML parsers resolve external entities. By default, some parsers are configured to fetch external DTDs and entities, which can lead to information disclosure (reading local files), Server-Side Request Forgery (SSRF), or denial-of-service (DoS) attacks. We focused on identifying the specific parsing functions and their configurations within the Perl code.
For instance, a common pattern in older Perl SOAP services might look like this:
use XML::LibXML; my $parser = XML::LibXML->new(); my $dom = $parser->parse_string($xml_payload); # ... further processing of $dom ...
The critical aspect here is the default behavior of `XML::LibXML->new()`. Without explicit configuration, it could be vulnerable. We needed to audit each instance of XML parsing to ensure it was configured securely. This involved searching the codebase for patterns like:
# Potentially vulnerable:
my $parser = XML::LibXML->new();
# More secure configuration:
my $parser = XML::LibXML->new(
no_network => 1,
no_catalogs => 1,
recover => 2, # Or 0, depending on desired error handling
);
Similarly, for `XML::Parser`, the vulnerability often stemmed from the `external_entity` callback:
use XML::Parser;
my $parser = XML::Parser->new(
# Default behavior might allow external entities
# Need to explicitly disable or handle callbacks
);
The secure approach would involve setting up callbacks to disallow external entity resolution:
use XML::Parser;
my $parser = XML::Parser->new(
ErrorContext => 2,
# Explicitly disallow external entities
external_entity => sub { return undef; },
# Or, if you need to process some entities but not others,
# implement custom logic here to validate and sanitize.
);
We developed a set of Perl scripts to statically analyze the codebase, searching for these parsing patterns and identifying potential weak points. This script would recursively traverse directories, read Perl files, and use regular expressions to find `XML::LibXML->new()` and `XML::Parser->new()` calls, flagging them for manual review.
#!/usr/bin/perl
use strict;
use warnings;
sub find_xml_parsers {
my ($dir) = @_;
opendir(my $dh, $dir) or die "Can't open directory $dir: $!";
while (my $file = readdir($dh)) {
next if ($file eq '.' or $file eq '..');
my $path = "$dir/$file";
if (-d $path) {
find_xml_parsers($path);
} elsif ($file =~ /\.pl$/ || $file =~ /\.pm$/) {
open(my $fh, '<', $path) or warn "Can't open $path: $!";
my $line_num = 0;
while (my $line = <$fh>) {
$line_num++;
if ($line =~ /XML::LibXML->new\(/ || $line =~ /XML::Parser->new\(/) {
print "Potential vulnerability found in $path at line $line_num: $line";
}
}
close($fh);
}
}
closedir($dh);
}
find_xml_parsers('/path/to/your/perl/app');
Exploitation and Proof-of-Concept Development
To validate our findings, we constructed proof-of-concept (PoC) XXE payloads. The goal was to demonstrate the impact of the vulnerability, ranging from simple file disclosure to more complex SSRF scenarios. We targeted a known vulnerable endpoint that accepted arbitrary XML. A common XXE payload for reading local files is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<root>
<data>&xxe;</data>
</root>
This payload defines an external entity `xxe` that points to `/etc/passwd`. When the vulnerable parser attempts to resolve this entity, it fetches the content of the file and injects it into the `&xxe;` placeholder within the XML document. The application then processes this modified XML, potentially returning the file content in its response.
For SSRF, we used a payload that attempts to make the server request an internal resource or an external attacker-controlled server:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/"> ]>
<root>
<data>&xxe;</data>
</root>
This payload attempts to fetch metadata from the GCP instance metadata service. If successful, it indicates that the server can make arbitrary HTTP requests, a critical security flaw. We used `curl` to send these payloads to the identified SOAP endpoints:
curl -X POST \ http://your-enterprise-app.com/soap/service \ -H 'Content-Type: text/xml; charset=utf-8' \ -H 'SOAPAction: "http://example.com/SomeAction"' \ -d '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]><root><data>&xxe;</data></root>'
The responses were carefully analyzed for leaked file contents or indications of successful external requests. This phase confirmed the severity and exploitability of the XXE vulnerabilities.
Mitigation Strategies and Implementation
The primary mitigation strategy was to configure XML parsers to disallow external entity resolution. This is a standard security best practice. For each identified vulnerable integration, we implemented the following:
- Update XML Parsing Libraries: Ensure that the latest stable versions of XML parsing modules are used. Newer versions often have more secure defaults or better mechanisms for disabling features.
- Explicitly Disable External Entities: Modify the parser instantiation to explicitly disable network access and external entity resolution.
- Input Validation and Sanitization: While not a primary defense against XXE, robust validation of incoming XML structure and content can add layers of defense.
- Web Application Firewall (WAF): Deploying a WAF with specific rules to detect and block XXE patterns can provide an additional layer of protection, especially for legacy systems that cannot be immediately refactored.
The code modifications involved updating the parser instantiation as demonstrated earlier. For example, for `XML::LibXML`:
use XML::LibXML;
my $parser = XML::LibXML->new(
no_network => 1, # Disallow network access for entity resolution
no_catalogs => 1, # Disallow catalog resolution
recover => 2, # Enable error recovery but avoid fetching external DTDs
);
# ... rest of the code ...
And for `XML::Parser`:
use XML::Parser;
my $parser = XML::Parser->new(
ErrorContext => 2,
# This callback prevents the parser from fetching external entities
external_entity => sub {
my ($parser, $entityName, $entityValue) = @_;
warn "Attempted to resolve external entity: $entityName\n";
return undef; # Deny resolution
},
);
# ... rest of the code ...
In GCP, we also leveraged Cloud Armor for WAF capabilities. We configured custom WAF rules to inspect incoming HTTP requests for common XXE patterns, such as `
# Example Cloud Armor WAF rule snippet (conceptual)
# This would be configured via gcloud or the GCP console
# Rule to block requests with common XXE DOCTYPE declarations
{
"priority": 100,
"description": "Block XXE DOCTYPE declarations",
"match": {
"expr": "request.headers['content-type'].matches('text/xml') || request.headers['content-type'].matches('application/soap+xml') || request.headers['content-type'].matches('application/xml')"
},
"action": "deny(403)",
"preview": false,
"rateLimit": {
"rateLimitThreshold": 0,
"rateLimitDurationSec": 0
},
"rule": "request.body.matches('')"
}
The deployment process involved a phased rollout, starting with non-production environments, followed by a gradual rollout to production with close monitoring. Each change was accompanied by regression testing to ensure that legitimate SOAP requests were not affected.
Post-Mitigation Verification and Ongoing Monitoring
After applying the mitigations, a rigorous verification process was essential. We re-ran our PoC XXE payloads against the patched endpoints to confirm that they were no longer exploitable. The expected outcome was either a clean response indicating no vulnerability or an error message that did not reveal sensitive information.
Beyond immediate verification, establishing ongoing monitoring was critical. This included:
- Log Analysis: Configuring centralized logging (e.g., using Google Cloud Logging) to capture and analyze all incoming SOAP requests. We set up alerts for suspicious patterns, such as requests containing `
- WAF Monitoring: Regularly reviewing WAF logs (Cloud Armor) for blocked requests. This helps identify ongoing attack attempts and fine-tune WAF rules.
- Regular Audits: Scheduling periodic security audits of the SOAP integrations, especially when new features are added or code is modified.
- Dependency Scanning: Implementing automated tools to scan for vulnerable versions of Perl modules, including XML parsers, and integrating this into the CI/CD pipeline.
This comprehensive approach, combining code-level fixes with network-level defenses and continuous monitoring, significantly reduced the risk of XXE injection and strengthened the overall security posture of the enterprise Perl stack on GCP.