HarborGuardharborguardDatabase
Back to search
CRITICALCVE-2026-44020Published Modified CNA GitHub_M

CVE-2026-44020: Docling: Unsafe XML Entity Expansion in USPTO Patent Backend

Docling simplifies document processing by parsing diverse formats and providing integrations with the generative AI ecosystem. From 2.13.0 until 2.74.0, the USPTO patent XML parser used the standard xml.sax.parseString() without protection against XML External Entity (XXE) attacks. An attacker could craft malicious USPTO patent XML files with external entity references that could read arbitrary files from the server filesystem, perform Server-Side Request Forgery (SSRF) attacks, or cause denial of service through entity expansion (Billion Laughs attack). The vulnerability affects three USPTO patent format parsers: ICE (v4.x), Grant v2.5, and Application v1.x. This vulnerability is fixed in 2.74.0.

Metrics

CVSS v3.1
9.4
Severity
CRITICAL
Fixed in
Affected Products
1

Get notified

Email me when this CVE is updated: new fix versions, severity changes, or any record change.

HarborGuard Analysis

Synopsis

This is an XML External Entity (XXE) injection vulnerability in the Docling document-processing library, specifically in its USPTO patent XML parsers (ICE v4.x, Grant v2.5, and Application v1.x). The flaw is reachable over the network with no authentication required, because any system that accepts USPTO patent XML files from external sources can be fed a malicious document. Successful exploitation lets an attacker read arbitrary files from the server filesystem, issue server-side HTTP requests to internal services (SSRF), or crash the service through recursive entity expansion (the Billion Laughs attack). No fix version has been published yet; HarborGuard tracks the upstream advisory and will make a patched-image rebuild available as soon as a fix is released.

HarborGuard Coverage

Detection

Detection is available across every HarborGuard environment. The CVE is ingested from upstream advisory feeds within minutes of publication and matched against all customer images, including custom-built images that bundle the Docling library, in both registry scans and CI pipeline checks.

Available
Triage

HarborGuard scores this finding at CVSS 9.4 (Critical) and weights it against each environment's compliance policy to determine urgency and routing. Triage tickets are directed to the appropriate team inbox within each customer organization based on image ownership and policy configuration.

Available
Patch

Because no upstream fix has been published, HarborGuard re-checks the advisory on every ingest cycle and will make a patched-image rebuild available the moment a fix version appears. For customers with auto-remediation enabled, the rebuild, regression test run, and PR against affected workloads will be triggered automatically at that point.

Pending upstream

Exploit Conditions

  • Network reachabilityRequired

    The attacker must reach the service over the network by supplying a crafted USPTO patent XML file to any network-exposed endpoint that invokes the Docling parser.

  • AuthenticationNot required

    No credentials are needed; any party able to submit a document to the parsing service can trigger the vulnerability.

  • Victim interactionNot required

    No user action is required beyond the service processing the malicious XML file, which happens automatically upon upload or ingestion.

  • Attack complexityDetail

    The exploit is reliable and condition-free; crafting a malicious XML file with external entity references requires no special timing, memory layout knowledge, or environmental prerequisites.

Blast Radius

  • Reads arbitrary files from the server filesystem, including credentials, private keys, and application configuration files.
  • Issues server-side HTTP requests to internal network services or cloud metadata endpoints, exposing infrastructure that would otherwise be unreachable from the public internet.
  • Crashes the affected Docling parsing process through recursive entity expansion (Billion Laughs), disrupting document-processing availability.
  • Exposes internal network topology and service endpoints by using the SSRF vector to probe and enumerate adjacent systems.

How HarborGuard Handles This

Available on HarborGuard: because no upstream fix exists for CVE-2026-44020 at this time, HarborGuard continuously re-checks the advisory on every ingest cycle and will make a patched-image rebuild available automatically the moment docling 2.74.0 or a later fix version is published. For customers with auto-remediation enabled, that event will immediately trigger a rebuild, regression test run, and a PR opened against every affected workload. In the interim, compensating controls worth considering include network-policy isolation to prevent the Docling service from making outbound HTTP requests (blocking SSRF pivot paths), egress filtering on the parsing tier to restrict connections to known-good destinations, and input validation or allowlisting at the document-ingestion boundary to reject XML payloads before they reach the SAX parser. Customers can also evaluate feature-flag gating to disable USPTO patent format parsing until a patched version is available.

See how HarborGuard automates this
Affected packages
  • docling-project / docling
    >= 2.13.0, < 2.74.0
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:H