HTML Sanitizer API

1. Introduction

This section is not normative.

Web applications often need to work with strings of HTML on the client side, perhaps as part of a client-side templating solution, perhaps as part of rendering user generated content, etc. It is difficult to do so in a safe way. The naive approach of joining strings together and stuffing them into an Element’s innerHTML is fraught with risk, as it can cause JavaScript execution in a number of unexpected ways.

Libraries like [DOMPURIFY] attempt to manage this problem by carefully parsing and sanitizing strings before insertion, by constructing a DOM and filtering its members through an allow-list. This has proven to be a fragile approach, as the parsing APIs exposed to the web don’t always map in reasonable ways to the browser’s behavior when actually rendering a string as HTML in the "real" DOM. Moreover, the libraries need to keep on top of browsers' changing behavior over time; things that once were safe may turn into time-bombs based on new platform-level features.

The browser has a fairly good idea of when it is going to execute code. We can improve upon the user-space libraries by teaching the browser how to render HTML from an arbitrary string in a safe manner, and do so in a way that is much more likely to be maintained and updated along with the browser’s own changing parser implementation. This document outlines an API which aims to do just that.

1.1. Goals

Mitigate the risk of DOM-based cross-site scripting attacks by providing developers with mechanisms for handling user-controlled HTML which prevent direct script execution upon injection.
Make HTML output safe for use within the current user agent, taking into account its current understanding of HTML.
Allow developers to override the default set of elements and attributes. Adding certain elements and attributes can prevent script gadget attacks.

1.2. API Summary

The Sanitizer API offers functionality to parse a string containing HTML into a DOM tree, and to filter the resulting tree according to a user-supplied configuration. The methods come in two by two flavours:

Safe and unsafe: The "safe" methods will not generate any markup that executes script. That is, they should be safe from XSS. The "unsafe" methods will parse and filter whatever they’re supposed to. See also: § 4 Security Considerations.
Context: Methods are defined on Element and ShadowRoot and will replace these Node’s children, and are largely analogous to innerHTML. There are also static methods on the Document, which parse an entire document are largely analogous to DOMParser.parseFromString().

2. Framework

2.1. Sanitizer API

The Element interface defines two methods, setHTML() and setHTMLUnsafe(). Both of these take a DOMString with HTML markup, and an optional configuration.

partial interface Element {
  [CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
  [CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};

Element’s setHTMLUnsafe(html, options) method steps are:

Let compliantHTML be the result of invoking the Get Trusted Type compliant string algorithm with TrustedHTML, this’s relevant global object, html, "Element setHTMLUnsafe", and "script".
Let target be this’s template contents if this is a template element; otherwise this.
Set and filter HTML given target, this, compliantHTML, options, and false.

Element’s setHTML(html, options) method steps are:

Let target be this’s template contents if this is a template; otherwise this.
Set and filter HTML given target, this, html, options, and true.

partial interface ShadowRoot {
  [CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
  [CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};

These methods are mirrored on the ShadowRoot:

ShadowRoot’s setHTMLUnsafe(html, options) method steps are:

Let compliantHTML be the result of invoking the Get Trusted Type compliant string algorithm with TrustedHTML, this’s relevant global object, html, "ShadowRoot setHTMLUnsafe", and "script".
Set and filter HTML using this, this’s shadow host (as context element), compliantHTML, options, and false.

ShadowRoot’s setHTML(html, options) method steps are:

Set and filter HTML using this (as target), this (as context element), html, options, and true.

The Document interface gains two new methods which parse an entire Document:

partial interface Document {
  static Document parseHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
  static Document parseHTML(DOMString html, optional SetHTMLOptions options = {});
};

The parseHTMLUnsafe(html, options) method steps are:

Let compliantHTML be the result of invoking the Get Trusted Type compliant string algorithm with TrustedHTML, this’s relevant global object, html, "Document parseHTMLUnsafe", and "script".
Let document be a new Document, whose content type is "text/html".

Note: Since document does not have a browsing context, scripting is disabled.
Set document’s allow declarative shadow roots to true.
Parse HTML from a string given document and compliantHTML.
Let sanitizer be the result of calling get a sanitizer instance from options with options and false.
Call sanitize on document with sanitizer and false.
Return document.

The parseHTML(html, options) method steps are:

Let document be a new Document, whose content type is "text/html".

Note: Since document does not have a browsing context, scripting is disabled.
Set document’s allow declarative shadow roots to true.
Parse HTML from a string given document and html.
Let sanitizer be the result of calling get a sanitizer instance from options with options and true.
Call sanitize on document with sanitizer and true.
Return document.

2.2. SetHTML options and the configuration object.

The family of setHTML()-like methods all accept an options dictionary. Right now, only one member of this dictionary is defined:

enum SanitizerPresets { "default" };
dictionary SetHTMLOptions {
  (Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = "default";
};
dictionary SetHTMLUnsafeOptions {
  (Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = {};
};

The Sanitizer configuration object encapsulates a filter configuration. The same configuration can be used with both "safe" or "unsafe" methods, where the "safe" methods perform an implicit removeUnsafe operation on the passed in configuration and have a default configuration when none is passed. The intent is that one (or a few) configurations will be built-up early on in a page’s lifetime, and can then be used whenever needed. This allows implementations to pre-process configurations.

The configuration object can be queried to return a configuration dictionary. It can also be modified directly.

[Exposed=Window]
interface Sanitizer {
  constructor(optional (SanitizerConfig or SanitizerPresets) configuration = "default");

  // Query configuration:
  SanitizerConfig get();

  // Modify a Sanitizer’s lists and fields:
  undefined allowElement(SanitizerElementWithAttributes element);
  undefined removeElement(SanitizerElement element);
  undefined replaceElementWithChildren(SanitizerElement element);
  undefined allowAttribute(SanitizerAttribute attribute);
  undefined removeAttribute(SanitizerAttribute attribute);
  undefined setComments(boolean allow);
  undefined setDataAttributes(boolean allow);

  // Remove markup that executes script. May modify multiple lists:
  undefined removeUnsafe();
};

A Sanitizer has an associated configuration, a SanitizerConfig.

The constructor(configuration) method steps are:

If configuration is a SanitizerPresets string, then:
1. Assert: configuration is default.
2. Set configuration to the built-in safe default configuration.
Let valid be the return value of set a configuration with configuration and true on this.
If valid is false, then throw a TypeError.

The get() method steps are to return the value of this’s configuration.

The allowElement(element) method steps are to allow an element with element and this’s configuration.

The removeElement(element) method steps are to remove an element with element and this’s configuration.

The replaceElementWithChildren(element) method steps are to replace an element with its children with element and this’s configuration.

The allowAttribute(attribute) method steps are to allow an attribute with attribute and this’s configuration.

The removeAttribute(attribute) method steps are to remove an attribute with attribute and this’s configuration.

The setComments(allow) method steps to set comments with allow and this’s configuration.

The setDataAttributes(allow) method steps are to set data attributes with allow and this’s configuration.

The removeUnsafe() method steps are to update this’s configuration with the result of calling remove unsafe on this’s configuration.

2.3. The Configuration Dictionary

dictionary SanitizerElementNamespace {
  required DOMString name;
  DOMString? _namespace = "http://www.w3.org/1999/xhtml";
};

// Used by "elements"
dictionary SanitizerElementNamespaceWithAttributes : SanitizerElementNamespace {
  sequence<SanitizerAttribute> attributes;
  sequence<SanitizerAttribute> removeAttributes;
};

typedef (DOMString or SanitizerElementNamespace) SanitizerElement;
typedef (DOMString or SanitizerElementNamespaceWithAttributes) SanitizerElementWithAttributes;

dictionary SanitizerAttributeNamespace {
  required DOMString name;
  DOMString? _namespace = null;
};
typedef (DOMString or SanitizerAttributeNamespace) SanitizerAttribute;

dictionary SanitizerConfig {
  sequence<SanitizerElementWithAttributes> elements;
  sequence<SanitizerElement> removeElements;
  sequence<SanitizerElement> replaceWithChildrenElements;

  sequence<SanitizerAttribute> attributes;
  sequence<SanitizerAttribute> removeAttributes;

  boolean comments;
  boolean dataAttributes;
};

3. Algorithms

To set and filter HTML, given an Element or DocumentFragment target, an Element contextElement, a string html, and a dictionary options, and a boolean safe:

If safe and contextElement’s local name is "script" and contextElement’s namespace is the HTML namespace or the SVG namespace, then return.
Let sanitizer be the result of calling get a sanitizer instance from options with options and safe.
Let newChildren be the result of the HTML fragment parsing algorithm given contextElement, html, and true.
Let fragment be a new DocumentFragment whose node document is contextElement’s node document.
For each node in newChildren, append node to fragment.
Run sanitize on fragment using sanitizer and safe.
Replace all with fragment within target.

To get a sanitizer instance from options from a dictionary options with a boolean safe, do:

Note: This algorithm works for both SetHTMLOptions and SetHTMLUnsafeOptions. They only differ in the defaults.

Let sanitizerSpec be "default".
If options["sanitizer"] exists, then:
1. Set sanitizerSpec to options["sanitizer"]
Assert: sanitizerSpec is either a Sanitizer instance, a string which is a SanitizerPresets member, or a dictionary.
If sanitizerSpec is a string:
1. Assert: sanitizerSpec is "default"
2. Set sanitizerSpec to the built-in safe default configuration.
Assert: sanitizerSpec is either a Sanitizer instance, or a dictionary.
If sanitizerSpec is a dictionary:
1. Let sanitizer be a new Sanitizer instance.
2. Let setConfigurationResult be the result of set a configuration with sanitizerSpec and not safe on sanitizer.
3. If setConfigurationResult is false, throw a TypeError.
4. Set sanitizerSpec to sanitizer.
Assert: sanitizerSpec is a Sanitizer instance.
Return sanitizerSpec.

3.1. Sanitization Algorithms

For the main sanitize operation, using a ParentNode node, a Sanitizer sanitizer, and a boolean safe, run these steps:

Let configuration be the value of sanitizer’s configuration.
If safe is true, then set configuration to the result of calling remove unsafe on configuration.
Call sanitize core on node, configuration, and with handleJavascriptNavigationUrls set to safe.

The sanitize core operation, using a ParentNode node, a SanitizerConfig configuration, and a boolean handleJavascriptNavigationUrls, iterates over the DOM tree beginning with node, and may recurse to handle some special cases (e.g. template contents). It consistes of these steps:

For each child of node’s children:
1. Assert: child implements Text, Comment, Element, or DocumentType.
  
  Note: Currently, this algorithm is only called on output of the HTML parser for which this assertion should hold. DocumentType should only occur for parseHTML and parseHTMLUnsafe. If in the future this algorithm will be used in different contexts, this assumption needs to be re-examined.
2. If child implements DocumentType, then continue.
3. If child implements Text, then continue.
4. If child implements Comment:
  1. If configuration["comments"] is not true, then remove child.
5. Otherwise:
  1. Let elementName be a SanitizerElementNamespace with child’s local name and namespace.
  2. If configuration["replaceWithChildrenElements"] contains elementName:
    1. Call sanitize core on child with configuration and handleJavascriptNavigationUrls.
    2. Call replace all with child’s children within child.
    3. Continue.
  3. If configuration["removeElements"] contains elementName, or if configuration["elements"] is not empty and does not contain elementName:
    1. Remove child.
    2. Continue.
  4. If elementName equals «[ "name" → "template", "namespace" → HTML namespace ]»
    1. Then call sanitize core on child’s template contents with configuration and handleJavascriptNavigationUrls.
  5. If child is a shadow host, then call sanitize core on child’s shadow root with configuration and handleJavascriptNavigationUrls.
  6. For each attribute in child’s attribute list:
    1. Let attrName be a SanitizerAttributeNamespace with attribute’s local name and namespace.
    2. If configuration["removeAttributes"] contains attrName, then Remove attribute from child.
    3. If configuration["elements"]["removeAttributes"] contains attrName, then remove attribute from child.
    4. If all of the following are false, then remove attribute from child.
      - configuration["attributes"] exists and contains attrName
      - configuration["elements"]["attributes"] contains attrName
      - "data-" is a code unit prefix of local name and namespace is null and configuration["dataAttributes"] is true
    5. If handleJavascriptNavigationUrls:
      1. If «[elementName, attrName]» matches an entry in the built-in navigating URL attributes list, and if attribute contains a javascript: URL, then remove attribute from child.
      2. If child’s namespace is the MathML Namespace and attr’s local name is "href" and attr’s namespace is null or the XLink namespace and attr contains a javascript: URL, then remove attr.
      3. If the built-in animating URL attributes list contains «[elementName, attrName]» and attr’s value is "href" or "xlink:href", then remove attr.
  7. Call sanitize core on child with configuration and handleJavascriptNavigationUrls.

Note: Current browsers support javascript: URLs only when navigating. Since navigation itself is not an XSS threat we handle navigation to javascript: URLs, but not navigations in general.

Declarative navigation falls into a handful of categories:

Anchor elements. (<a> in HTML and SVG namespaces)
Form elements that trigger navigation as part of the form action.
[MathML] allows any element to act as an anchor.
[SVG11] animation.

The first two are covered by the built-in navigating URL attributes list.

The MathML case is covered by a seperate rule, because there is no formalism in this spec to cover a "per-namespace global" rule.

The SVG animation case is covered by the built-in animating URL attributes list. But since the interpretation of SVG animation elements depends on the animation target, and since during sanitization we cannot know what the final target will be, the sanitize algorithm blocks any animation of href attributes.

To determine whether an attribute contains a javascript: URL:

Let url be the result of running the basic URL parser on attribute’s value.
If url is failure, then return false.
Return whether url’s scheme is "javascript".

3.2. Configuration Processing

To allow an element element with a SanitizerConfig configuration, do:

Set element to the result of canonicalize a sanitizer element with attributes with element.
Remove element from configuration["elements"].
Append element to configuration["elements"].
Remove element from configuration["removeElements"].
Remove element from configuration["replaceWithChildrenElements"].

NOTE: Handling of allowElement is a little more complicated than the other methods, because the element allow list can have per-element allow- and remove-attribute lists. We first remove the given element from the list before then adding it, which has the effect of re-setting (rather than merging or elsehow modifying) the per-element list to whatever is passed in. In other words, the per-element allow- and remove-lists can only be set as a whole.

NOTE: Remove matches on name and namespace, so adding an element with attributes would still remove the matching element from the removeElements and replaceWithChildrenElements lists.

To remove an element element from a SanitizerConfig configuration, do:

Set element to the result of canonicalize a sanitizer element with element.
Add element to configuration["removeElements"].
Remove element from configuration["elements"] list.
Remove element from configuration["replaceWithChildrenElements"].

To replace an element with its children element from a SanitizerConfig configuration, do:

Set element to the result of canonicalize a sanitizer element with element.
Add element to configuration["replaceWithChildrenElements"].
Remove element from configuration["removeElements"].
Remove element from configuration["elements"] list.

To allow an attribute attribute on a SanitizerConfig configuration, do:

Set attribute to the result of canonicalize a sanitizer attribute with attribute.
Add attribute to configuration["attributes"].
Remove attribute from configuration["removeAttributes"].

To remove an attribute attribute from a SanitizerConfig configuration, do:

Set attribute to the result of canonicalize a sanitizer attribute with attribute.
Add attribute to configuration["removeAttributes"].
Remove attribute from configuration["attributes"].

To set comments with allow on a SanitizerConfig configuration, do:

Set configuration["comments"] to allow.

To set data attributes with allow on a SanitizerConfig configuration, do:

Set configuration["dataAttributes"] to allow.

Note: While this algorithm is called remove unsafe, we use the term "unsafe" strictly in the sense of this spec, to denote content that will execute JavaScript when inserted into the document. In other words, this method will remove oportunities for XSS.

To remove unsafe from a configuration, do this:

Assert: The built-in safe baseline configuration has removeElements and removeAttributes keys set, but not elements, replaceWithChildrenElements, or attributes.
Let result be a copy of configuration.
For each element in built-in safe baseline configuration[removeElements]:
1. Call remove an element with element and result.
For each attribute in built-in safe baseline configuration[removeAttributes]:
1. Call remove an attribute with attribute and result.
For each attribute listed in event handler content attributes:
1. Call remove an attribute with attribute and result.
Return result.

To set a configuration, given a dictionary configuration, a boolean allowCommentsAndDataAttributes, and a Sanitizer sanitizer:

For each element of configuration["elements"] do:
1. Call allow an element with element and sanitizer’s configuration.
For each element of configuration["removeElements"] do:
1. Call remove an element with element and sanitizer’s configuration.
For each element of configuration["replaceWithChildrenElements"] do:
1. Call replace an element with its children with element and sanitizer’s configuration.
For each attribute of configuration["attributes"] do:
1. Call allow an attribute with attribute and sanitizer’s configuration.
For each attribute of configuration["removeAttributes"] do:
1. Call remove an attribute with attribute and sanitizer’s configuration.
If configuration["comments"] exists:
1. Then call set comments with configuration["comments"] and sanitizer’s configuration.
2. Otherwise call set comments with allowCommentsAndDataAttributes and sanitizer’s configuration.
If configuration["dataAttributes"] exists:
1. Then call set data attributes with configuration["dataAttributes"] and sanitizer’s configuration.
2. Otherwise call set data attributes with allowCommentsAndDataAttributes and sanitizer’s configuration.
Return whether all of the following are true:
- size of configuration["elements"] equals size of sanitizer’s configuration["elements"].
- size of configuration["removeElements"] equals size of sanitizer’s configuration["removeElements"].
- size of configuration["replaceWithChildrenElements"] equals size of sanitizer’s configuration["replaceWithChildrenElements"].
- size of configuration["attributes"] equals size of sanitizer’s configuration["attributes"].
- size of configuration["removeAttributes"] equals size of sanitizer’s configuration["removeAttributes"].
- Either configuration["elements"] or configuration["removeElements"] exist, or neither, but not both.
- Either configuration["attributes"] or configuration["removeAttributes"] exist, or neither, but not both.

Note: Previous versions of this spec had elaborate definitions of how to canonicalize a config. This has now effectively been moved into the method definitions.

Note: This operation is defined in terms of the manipulation methods on the Sanitizer. Those methods remove matching entries from other lists. The size equality steps in the last step would then catch this. For example: { allow: ["div", "div"] } would create a Sanitizer with one element in the allow list. The final test would then return false, which would cause the caller to throw an exception.

This is still missing error checks for the per-element attribute lists and syntax errors.

In order to canonicalize a sanitizer element with attributes a SanitizerElementWithAttributes element, do this:

Let result be the result of canonicalize a sanitizer element with element.
If element is a dictionary:
1. For each attribute in element["attributes"]:
  1. Add the result of canonicalize a sanitizer attribute with attribute to result["attributes"].
2. For each attribute in element["removeAttributes"]:
  1. Add the result of canonicalize a sanitizer attribute with attribute to result["removeAttributes"].
Return result.

In order to canonicalize a sanitizer element a SanitizerElement element, return the result of canonicalize a sanitizer name with element and the HTML namespace as the default namespace.

In order to canonicalize a sanitizer attribute a SanitizerAttribute attribute, return the result of canonicalize a sanitizer name with attribute and null as the default namespace.

In order to canonicalize a sanitizer name name, with a default namespace defaultNamespace, run the following steps:

Assert: name is either a DOMString or a dictionary.
If name is a DOMString, then return «[ "name" → name, "namespace" → defaultNamespace]».
Assert: name is a dictionary and name["name"] exists.
Let namespace be name["namespace"] if it exists, otherwise defaultNamespace.
If namespace is the empty string, then set it to null.
Return «[
"name" → name["name"],
"namespace" → namespace
]».

3.3. Supporting Algorithms

For the canonicalized element and attribute name lists used in this spec, list membership is based on matching both "name" and "namespace" entries:

A Sanitizer name list contains an item if there exists an entry of list that is an ordered map, and where item["name"] equals entry["name"] and item["namespace"] equals entry["namespace"].

To remove an item from a list that is an ordered map, remove all entry from list where item["name"] equals entry["name"] and item["namespace"] equals entry["namespace"].

To add a name to a list, where name is canonicalized and list is an ordered map:

If list contains name, then return.
Append name to list.

Equality for ordered sets is equality of its members, but without regard to order: Ordered sets A and B are equal if both A is a superset of B and B is a superset of A.

To determine not of a boolean bool, return false if bool is true, and return true otherwise.

3.4. Builtins

There are four builtins:

The built-in safe default configuration,
the built-in safe baseline configuration, and
the built-in navigating URL attributes list, and
the built-in animating URL attributes list.

The built-in safe default configuration is as follows:

{
  "elements": [
    {
      "name": "html",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "head",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "title",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "body",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "article",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "section",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "nav",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "aside",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h1",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h2",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h3",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h4",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h5",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h6",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "hgroup",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "header",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "footer",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "address",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "p",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "hr",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "pre",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "blockquote",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "cite",
          "namespace": null
        }
      ]
    },
    {
      "name": "ol",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "reversed",
          "namespace": null
        },
        {
          "name": "start",
          "namespace": null
        },
        {
          "name": "type",
          "namespace": null
        }
      ]
    },
    {
      "name": "ul",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "menu",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "li",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "value",
          "namespace": null
        }
      ]
    },
    {
      "name": "dl",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "dt",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "dd",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "figure",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "figcaption",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "main",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "search",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "div",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "a",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "href",
          "namespace": null
        },
        {
          "name": "rel",
          "namespace": null
        },
        {
          "name": "hreflang",
          "namespace": null
        },
        {
          "name": "type",
          "namespace": null
        }
      ]
    },
    {
      "name": "em",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "strong",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "small",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "s",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "cite",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "q",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "dfn",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "abbr",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "ruby",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "rt",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "rp",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "data",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "value",
          "namespace": null
        }
      ]
    },
    {
      "name": "time",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "datetime",
          "namespace": null
        }
      ]
    },
    {
      "name": "code",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "var",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "samp",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "kbd",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "sub",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "sup",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "i",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "b",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "u",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "mark",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "bdi",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "bdo",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "span",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "br",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "wbr",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "ins",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "cite",
          "namespace": null
        },
        {
          "name": "datetime",
          "namespace": null
        }
      ]
    },
    {
      "name": "del",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "cite",
          "namespace": null
        },
        {
          "name": "datetime",
          "namespace": null
        }
      ]
    },
    {
      "name": "table",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "caption",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "colgroup",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "span",
          "namespace": null
        }
      ]
    },
    {
      "name": "col",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "span",
          "namespace": null
        }
      ]
    },
    {
      "name": "tbody",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "thead",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "tfoot",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "tr",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "td",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "colspan",
          "namespace": null
        },
        {
          "name": "rowspan",
          "namespace": null
        },
        {
          "name": "headers",
          "namespace": null
        }
      ]
    },
    {
      "name": "th",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "colspan",
          "namespace": null
        },
        {
          "name": "rowspan",
          "namespace": null
        },
        {
          "name": "headers",
          "namespace": null
        },
        {
          "name": "scope",
          "namespace": null
        },
        {
          "name": "abbr",
          "namespace": null
        }
      ]
    },
    {
      "name": "math",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "merror",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mfrac",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mi",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mmultiscripts",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mn",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mo",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "form",
          "namespace": null
        },
        {
          "name": "fence",
          "namespace": null
        },
        {
          "name": "separator",
          "namespace": null
        },
        {
          "name": "lspace",
          "namespace": null
        },
        {
          "name": "rspace",
          "namespace": null
        },
        {
          "name": "stretchy",
          "namespace": null
        },
        {
          "name": "symmetric",
          "namespace": null
        },
        {
          "name": "maxsize",
          "namespace": null
        },
        {
          "name": "minsize",
          "namespace": null
        },
        {
          "name": "largeop",
          "namespace": null
        },
        {
          "name": "movablelimits",
          "namespace": null
        }
      ]
    },
    {
      "name": "mover",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "accent",
          "namespace": null
        }
      ]
    },
    {
      "name": "mpadded",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "width",
          "namespace": null
        },
        {
          "name": "height",
          "namespace": null
        },
        {
          "name": "depth",
          "namespace": null
        },
        {
          "name": "lspace",
          "namespace": null
        },
        {
          "name": "voffset",
          "namespace": null
        }
      ]
    },
    {
      "name": "mphantom",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mprescripts",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mroot",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mrow",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "ms",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mspace",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "width",
          "namespace": null
        },
        {
          "name": "height",
          "namespace": null
        },
        {
          "name": "depth",
          "namespace": null
        }
      ]
    },
    {
      "name": "msqrt",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mstyle",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "msub",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "msubsup",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "msup",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mtable",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mtd",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "columnspan",
          "namespace": null
        },
        {
          "name": "rowspan",
          "namespace": null
        }
      ]
    },
    {
      "name": "mtext",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mtr",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "munder",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "accentunder",
          "namespace": null
        }
      ]
    },
    {
      "name": "munderover",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "accent",
          "namespace": null
        },
        {
          "name": "accentunder",
          "namespace": null
        }
      ]
    },
    {
      "name": "semantics",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    }
  ],
  "attributes": [
    {
      "name": "dir",
      "namespace": null
    },
    {
      "name": "lang",
      "namespace": null
    },
    {
      "name": "title",
      "namespace": null
    },
    {
      "name": "displaystyle",
      "namespace": null
    },
    {
      "name": "mathbackground",
      "namespace": null
    },
    {
      "name": "mathcolor",
      "namespace": null
    },
    {
      "name": "mathsize",
      "namespace": null
    },
    {
      "name": "scriptlevel",
      "namespace": null
    }
  ],
  "comments": false,
  "dataAttributes": false
}

Note: Included [MathML] markup is based on [SafeMathML].

The built-in safe baseline configuration is meant to block only script-content. It is as follows:

{
  "removeElements": [
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "script"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "frame"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "iframe"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "object"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "embed"
    },
    {
      "namespace": "http://www.w3.org/2000/svg",
      "name": "script"
    },
    {
      "namespace": "http://www.w3.org/2000/svg",
      "name": "use"
    }
  ],
  "removeAttributes": []
}

Warning: The remove unsafe algorithm specifies to additionally remove any event handler content attributes, as defined in [HTML]. If a user agent defines extensions to the [HTML] spec with additional event handler content attributes, it is its responsibility to decide how to handle them. Using the current event handler content attributes list, the safe baseline configuration looks effectively like so:

{
  "removeElements": [
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "script"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "frame"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "iframe"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "object"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "embed"
    },
    {
      "namespace": "http://www.w3.org/2000/svg",
      "name": "script"
    },
    {
      "namespace": "http://www.w3.org/2000/svg",
      "name": "use"
    }
  ],
  "removeAttributes": [
    "onafterprint",
    "onauxclick",
    "onbeforeinput",
    "onbeforematch",
    "onbeforeprint",
    "onbeforeunload",
    "onbeforetoggle",
    "onblur",
    "oncancel",
    "oncanplay",
    "oncanplaythrough",
    "onchange",
    "onclick",
    "onclose",
    "oncontextlost",
    "oncontextmenu",
    "oncontextrestored",
    "oncopy",
    "oncuechange",
    "oncut",
    "ondblclick",
    "ondrag",
    "ondragend",
    "ondragenter",
    "ondragleave",
    "ondragover",
    "ondragstart",
    "ondrop",
    "ondurationchange",
    "onemptied",
    "onended",
    "onerror",
    "onfocus",
    "onformdata",
    "onhashchange",
    "oninput",
    "oninvalid",
    "onkeydown",
    "onkeypress",
    "onkeyup",
    "onlanguagechange",
    "onload",
    "onloadeddata",
    "onloadedmetadata",
    "onloadstart",
    "onmessage",
    "onmessageerror",
    "onmousedown",
    "onmouseenter",
    "onmouseleave",
    "onmousemove",
    "onmouseout",
    "onmouseover",
    "onmouseup",
    "onoffline",
    "ononline",
    "onpagehide",
    "onpagereveal",
    "onpageshow",
    "onpageswap",
    "onpaste",
    "onpause",
    "onplay",
    "onplaying",
    "onpopstate",
    "onprogress",
    "onratechange",
    "onreset",
    "onresize",
    "onrejectionhandled",
    "onscroll",
    "onscrollend",
    "onsecuritypolicyviolation",
    "onseeked",
    "onseeking",
    "onselect",
    "onslotchange",
    "onstalled",
    "onstorage",
    "onsubmit",
    "onsuspend",
    "ontimeupdate",
    "ontoggle",
    "onunhandledrejection",
    "onunload",
    "onvolumechange",
    "onwaiting",
    "onwheel"
  ]
}

The built-in navigating URL attributes list, for which "javascript:" navigations are "unsafe", are as follows:

«[
[ { "name" → "a", "namespace" → HTML namespace }, { "name" → "href", "namespace" → null } ],
[ { "name" → "area", "namespace" → HTML namespace }, { "name" → "href", "namespace" → null } ],
[ { "name" → "base", "namespace" → HTML namespace }, { "name" → "href", "namespace" → null } ],
[ { "name" → "button", "namespace" → HTML namespace }, { "name" → "formaction", "namespace" → null } ],
[ { "name" → "form", "namespace" → HTML namespace }, { "name" → "action", "namespace" → null } ],
[ { "name" → "iframe", "namespace" → HTML namespace }, { "name" → "src", "namespace" → null } ],
[ { "name" → "input", "namespace" → HTML namespace }, { "name" → "formaction", "namespace" → null } ],
[ { "name" → "a", "namespace" → SVG namespace }, { "name" → "href", "namespace" → null } ],
[ { "name" → "a", "namespace" → SVG namespace }, { "name" → "href", "namespace" → XLink namespace } ],
]»

The built-in animating URL attributes list, which can be used in [SVG11] to declaratively modify navigation elements to use "javascript:" URLs, is as follows:

«[
[ { "name" → "animate", "namespace" → SVG namespace }, { "name" → "attributeName", "namespace" → null] } ],
[ { "name" → "animateMotion", "namespace" → SVG namespace }, { "name" → "attributeName", "namespace" → null } ],
[ { "name" → "animateTransform", "namespace" → SVG namespace }, { "name" → "attributeName", "namespace" → null } ],
[ { "name" → "set", "namespace" → SVG namespace }, { "name" → "attributeName", "namespace" → null } ],
]»

4. Security Considerations

The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting by traversing a supplied HTML content and removing elements and attributes according to a configuration. The specified API must not support the construction of a Sanitizer object that leaves script-capable markup in and doing so would be a bug in the threat model.

That being said, there are security issues which the correct usage of the Sanitizer API will not be able to protect against and the scenarios will be laid out in the following sections.

4.1. Server-Side Reflected and Stored XSS

This section is not normative.

The Sanitizer API operates solely in the DOM and adds a capability to traverse and filter an existing DocumentFragment. The Sanitizer does not address server-side reflected or stored XSS.

4.2. DOM clobbering

This section is not normative.

DOM clobbering describes an attack in which malicious HTML confuses an application by naming elements through id or name attributes such that properties like children of an HTML element in the DOM are overshadowed by the malicious content.

The Sanitizer API does not protect DOM clobbering attacks in its default state, but can be configured to remove id and name attributes.

4.3. XSS with Script gadgets

This section is not normative.

Script gadgets are a technique in which an attacker uses existing application code from popular JavaScript libraries to cause their own code to execute. This is often done by injecting innocent-looking code or seemingly inert DOM nodes that is only parsed and interpreted by a framework which then performs the execution of JavaScript based on that input.

The Sanitizer API can not prevent these attacks, but requires page authors to explicitly allow unknown elements in general, and authors must additionally explicitly configure unknown attributes and elements and markup that is known to be widely used for templating and framework-specific code, like data- and slot attributes and elements like <slot> and <template>. We believe that these restrictions are not exhaustive and encourage page authors to examine their third party libraries for this behavior.

4.4. Mutated XSS

This section is not normative.

Mutated XSS or mXSS describes an attack based on parser context mismatches when parsing an HTML snippet without the correct context. In particular, when a parsed HTML fragment has been serialized to a string, the string is not guaranteed to be parsed and interpreted exactly the same when inserted into a different parent element. An example for carrying out such an attack is by relying on the change of parsing behavior for foreign content or mis-nested tags.

The Sanitizer API offers only functions that turn a string into a node tree. The context is supplied implicitly by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML() creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS.

If a developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then parse it again then mutated XSS may occur. We discourage this practice. If processing or passing of HTML as a string should be necessary after all, then any string should be considered untrusted and should be sanitized (again) when inserting it into the DOM. In other words, a sanitized and then serialized HTML tree can no longer be considered as sanitized.

A more complete treatment of mXSS can be found in [MXSS].

5. Acknowledgements

This work is informed and inspired by [DOMPURIFY] from cure53, Internet Explorer’s window.toStaticHTML() as well as the original [HTMLSanitizer] from Ben Bucksch. Anne van Kesteren, Krzysztof Kotowicz, Tom Schuster, Luke Warlow, Guillaume Weghsteen, and Mike West for their valuable feedback.

HTML Sanitizer API

Abstract

Status of this document

1. Introduction

1.1. Goals

1.2. API Summary

2. Framework

2.1. Sanitizer API

2.2. SetHTML options and the configuration object.

2.3. The Configuration Dictionary

3. Algorithms

3.1. Sanitization Algorithms

3.2. Configuration Processing

3.3. Supporting Algorithms

3.4. Builtins

4. Security Considerations

4.1. Server-Side Reflected and Stored XSS

4.2. DOM clobbering

4.3. XSS with Script gadgets

4.4. Mutated XSS

5. Acknowledgements

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

Informative References

IDL Index

Issues Index