In 2021, a penetration tester disclosed an XXE vulnerability in a healthcare SaaS app: a single crafted XML upload read /etc/passwd and internal AWS metadata from http://169.254.169.254/latest/meta-data/. The fix was two lines of parser config. The bug existed because the library's defaults were unsafe.
This is where most XXE vulnerabilities come from: not careless developers, but library defaults that haven't been hardened.
TL;DR
XXE attacks inject malicious entity references into XML to make the parser read local files, hit internal URLs, or loop until it crashes. The most common source is forgetting to disable dtdload and noent in libxmljs2, or not disabling external entity processing in Python's lxml. If your app doesn't parse XML, you're safe.
What Is XXE?
XML External Entity (XXE) is a vulnerability in applications that parse XML input. The XML spec allows "entities" that reference external resources. If the parser follows those references, an attacker controls what gets read.
The four attack classes:
- File read: Pull
/etc/passwd,.env, or any readable path - SSRF: Make server-side HTTP requests to internal IPs, cloud metadata endpoints, or internal APIs
- Denial of service: Recursive entity expansion ("billion laughs") that consumes all memory
- Port scan: Probe internal network services by watching which requests succeed or time out
How the Attack Works
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<userInfo>
<name>&xxe;</name>
</userInfo>
When a vulnerable parser processes this, it replaces &xxe; with the contents of /etc/passwd before your application code sees the data. The name field in your parsed document now contains the server's user list.
The SSRF variant swaps file:// for http://:
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<userInfo><name>&xxe;</name></userInfo>
Which Libraries Ship Vulnerable Defaults
libxmljs2 (Node.js)
This is the library behind most reported GSC queries on this topic.
libxmljs2 option defaults (as of v0.19+):
| Option | Default | Risk |
|---|---|---|
dtdload | false | DTDs not loaded externally (safe default) |
noent | false (parseXml) / true (some bindings) | Entity substitution; varies by call pattern |
dtdvalid | false | DTD validation (safe default) |
The problem: noent behavior is not consistent across libxmljs2 versions and depends on which binding path your code takes. Relying on any default is risky.
Never rely on defaults. Pass both flags explicitly:
const libxmljs = require('libxmljs2');
const doc = libxmljs.parseXml(xmlString, {
noent: false, // disable entity substitution
dtdload: false, // don't load external DTDs
dtdvalid: false, // don't validate against DTD
});
If you're using libxmljs2 as a transitive dependency (pulled in by another package), check what options that package passes. Many wrappers omit all options, accepting whatever the underlying defaults happen to be.
The safe alternative for most use cases is fast-xml-parser, which is XXE-safe by design:
const { XMLParser } = require('fast-xml-parser');
const parser = new XMLParser({
// No DTD or external entity support. XXE-safe out of the box.
});
const result = parser.parse(xmlString);
lxml (Python)
Python's lxml is the other common source of XXE bugs. Its etree.fromstring and etree.parse functions resolve external entities by default when the document includes a DOCTYPE.
from lxml import etree
# UNSAFE: resolves external entities
tree = etree.fromstring(xml_bytes)
import defusedxml.lxml as safe_lxml
# defusedxml patches lxml to block DTD loading, external entities, and entity expansion
tree = safe_lxml.fromstring(xml_bytes)
You can also configure lxml's parser directly:
from lxml import etree
parser = etree.XMLParser(
resolve_entities=False,
no_network=True,
load_dtd=False,
)
tree = etree.fromstring(xml_bytes, parser)
Java (DocumentBuilderFactory)
Java's default XML parser is also unsafe. This is OWASP's recommended hardening:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
Is Your App at Risk?
Most apps built with Cursor, Bolt, Lovable, or Replit use JSON APIs and are not vulnerable. XXE only matters if your server parses XML.
You are at risk if your app:
- Accepts XML file uploads (exports, imports, configuration)
- Integrates with SOAP APIs (older enterprise services)
- Parses SVG images server-side (thumbnailing, conversion)
- Processes Office documents (DOCX, XLSX, PPTX are ZIP+XML)
- Uses XML-based configuration files
If you use a PDF generation library that internally parses HTML as XML, or an image processing library that handles SVG, check the library's XXE posture even if you never write XML yourself.
How to Find XXE in Your App
Three places to check:
1. Grep for XML parser imports:
# Node.js
grep -r "require('libxmljs2')\|require('xml2js')\|require('sax')" .
# Python
grep -r "from lxml\|import xml.etree\|import libxml2" .
2. Look for multipart/form-data endpoints that accept XML or SVG content types.
3. Check any library that processes uploaded documents (DOCX parsers, spreadsheet parsers, diagram tools). Search the library's GitHub for "XXE" or "external entity" in open issues.
The Fastest Fix
If you only parse XML in one place and don't need DTD support (most apps don't), the fastest fix is to reject any document that contains a DOCTYPE declaration entirely:
function safeParseXml(xmlString) {
if (xmlString.includes('<!DOCTYPE') || xmlString.includes('<!ENTITY')) {
throw new Error('XML DOCTYPE/ENTITY declarations are not allowed');
}
return libxmljs.parseXml(xmlString, { noent: false, dtdload: false });
}
This is a belt-and-suspenders approach: reject suspicious input before the parser even runs.
What are the default values of dtdload and noent in libxmljs2?
In libxmljs2 v0.19+, dtdload defaults to false and noent defaults to false in parseXml. The catch: noent behavior has varied across versions and call paths. Always pass both options explicitly: { noent: false, dtdload: false, dtdvalid: false }. Any code relying on defaults can break silently across a version bump.
What is an XXE attack?
XXE (XML External Entity) is an attack where a malicious XML payload tricks the parser into reading local files, making internal HTTP requests, or crashing the server via recursive entities. It targets the XML spec's "external entity" feature, which most apps have no reason to use.
Is libxmljs2 vulnerable to XXE by default?
libxmljs2's dtdload option defaults to false, but noent (entity substitution) defaults to true in some call patterns. Always explicitly pass noent: false and dtdload: false when calling parseXml rather than relying on undocumented defaults.
Can JSON APIs have XXE vulnerabilities?
No. XXE is specific to XML parsing. JSON parsers don't support external entities. If your API only accepts JSON, you're not vulnerable to XXE.
Are SVG file uploads dangerous?
Yes. SVGs are XML-based and can contain XXE payloads. If you parse SVG files server-side for resizing or conversion, ensure your parser has external entities disabled. The safest option for user-uploaded SVGs is to re-encode them as PNG before any server-side processing.
What about DOCX or XLSX uploads?
DOCX, XLSX, and PPTX files are ZIP archives containing XML. Libraries that process these files should be configured to disable external entity processing. Check your library's documentation for an XXE-safe mode or use a library like python-docx that defaults to safe settings.
Check Your XML Handling
Our scanner tests for XXE vulnerabilities in your file upload endpoints and flags libraries with unsafe defaults.