HTML Sanitizer API

Draft Community Group Report,

This version:
https://wicg.github.io/sanitizer-api/
Issue Tracking:
GitHub
Inline In Spec
Editors:
Frederik Braun (Mozilla)
Mario Heiderich (Cure53)
Daniel Vogelheim (Google LLC)
Test Suite:
https://wpt.fyi/results/sanitizer-api/

Abstract

This document specifies a set of APIs which allow developers to take untrusted HTML input and sanitize it for safe insertion into a document’s DOM.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is not normative.

Web applications often need to work with strings of HTML on the client side, perhaps as part of a client-side templating solution, perhaps as part of rendering user generated content, etc. It is difficult to do so in a safe way. The naive approach of joining strings together and stuffing them into an Element's innerHTML is fraught with risk, as it can cause JavaScript execution in a number of unexpected ways.

Libraries like [DOMPURIFY] attempt to manage this problem by carefully parsing and sanitizing strings before insertion, by constructing a DOM and filtering its members through an allow-list. This has proven to be a fragile approach, as the parsing APIs exposed to the web don’t always map in reasonable ways to the browser’s behavior when actually rendering a string as HTML in the "real" DOM. Moreover, the libraries need to keep on top of browsers' changing behavior over time; things that once were safe may turn into time-bombs based on new platform-level features.

The browser has a fairly good idea of when it is going to execute code. We can improve upon the user-space libraries by teaching the browser how to render HTML from an arbitrary string in a safe manner, and do so in a way that is much more likely to be maintained and updated along with the browser’s own changing parser implementation. This document outlines an API which aims to do just that.

1.1. Goals

1.2. API Summary

let s = new Sanitizer();

// Case: The input data is available as a tree of DOM nodes.
let userControlledTree = ...;
element.replaceChildren(s.sanitize(userControlledTree));

// Case: The input is available as a string, and we know the element to insert
// it into:
let userControlledInput = "<img src=x onerror=alert(1)//>";
element.setHTML(userControlledInput, {sanitizer: s});

// Case: The input is available as a string, and we know which type of element
// we will eventually insert it to, but can’t or don’t want to perform the
// insertion now:
let forDiv = s.sanitizeFor("div", userControlledInput);
// Later:
document.querySelector(\`${forDiv.localName}#target\`).replaceChildren(...forDiv.childNodes);

1.3. The Trouble With Strings

Many HTML sanitizer libraries are based on string-to-string APIs, while this API does not offer such a method. This sub-section explains the reasons and implications for the Sanitizer API.

To convert a string into a tree of nodes (or a fragment), it needs to be parsed. The HTML parsing algorithm carefully specifies how parsing HTML works. This parsing algorithm is dependent on the current node as its parsing context. That is, the same string parsed in the context of different HTML nodes will yield different parse trees.

The string <em>bla in <div> and <textarea> context.
A table cell in <table> and non-table (<div>) context.

These differences can allow bugs to creep into a site’s sanitization strategy, which can (and have been) exploited by a class of XSS-style attacks called mXSS. These attacks ultimately depend on confusions of the parsing context, for example when a developer will sanitize a string in one (parsing) context, while then applying the resulting string in a different context, where it will be interpreted differently.

Two mXSS-style examples in real-world libraries can be found in [MXSS1]] and [MXSS2]. We’d like to stress that we picked these reports for their ease of reading. There are similar reports for pretty much every other tools that deals with HTML parsing.

Since this attack class depends on a particular usage of the string after the sanitization has occurred, the API itself has only limited capability to protect its users. As a result, the Sanitizer API follows the following principle:

Whenever the Sanitzer API parses or unparses a DOM (sub-)tree to or from a string, it will either do so in a fashion where the correct parse context is implied by the operation; or it will require a parse context to be supplied by the developer and will retain the given context in the resulting argument. In other words, the Sanitzer API will never assume a parsing context, or disappear a parsing context that has been supplied earlier.

1.3.1. Case 1: Sanitizing With Nodes, Only.

If the user data in question is already available as DOM nodes - for example a Document instance in a frame - then the Sanitizer can be easily used:

const sanitizer = new Sanitizer( ... );  // Our Sanitizer;

// There is an iframe with id "userFrame" whose content we are interested in.
const user_tree = document.getElementById("userFrame").contentWindow.document;
const sanitized = sanitizer.sanitize(user_tree);

Note: Parsing an HTML string can have various side-effects, like network requests or executing scripts. Naively parsing these, e.g. by assigning a string to .innerHTML of an unconnected element, will not reliably prevent these. Therefore, if the user data to be sanitized is originally in string form, we recommend to go with one of the following cases.

1.3.2. Case 2: Sanitizing a String with Implied Context.

If the user data is available in string form and we wish to directly insert the sanitized subtree into the DOM, we can do so as follows:

const user_string = "...";  // The user string.
const sanitizer = new Sanitizer( ... );  // Our Sanitizer;

// We want to insert the HTML in user_string into a target element with id
// target. That is, we want the equivalent of target.innerHTML = value, except
// without the XSS risks.
document.getElementById("target").setHTML(user_string, {sanitizer: sanitizer});

1.3.3. Case 3: Sanitizing a String with a Given Context.

If the user data is available in string form and the developer wishes to sanitize it now, but apply the result to the DOM later, then the Sanitizer must be informed about the context that it will be used. To prevent context confusion the result is wrapper a container that contains both the result and also the parse context. Conveniently, this container already exists, and it is the node itself!

// A certain piece of user input is meant to be used repeatedly, to insert
// it in multiple elements on the page. All these elements will be <div>
// elements.
const user_string = "...";  // The user string.
const sanitizer = new Sanitizer( ... );  // Our Sanitizer.

const sanitized = sanitizer.sanitizeFor("div", user_string);
sanitized instanceof HTMLDivElement  // true. The Sanitizer has given us a node.

// ... later, in the same program ...
for (let elem = ... of ...) {
  // All of our "elem" instances should be of the same type used in the
  // .sanitizeFor call above. With an assertion library, this could look as
  // follows:
  assert_true(elem instanceof sanitized.constructor);  // Assuming assert_true, like in WPT tests.
  elem.replaceChildren(...sanitized.childNodes);
}

// Instead of:
elem.replaceChildren(...sanitized.childNodes);
// one could write:
elem.innerHTML = sanitized.innerHTML;
// This should have the same effect, except be slower, since this will trigger
// un-parsing and then re-parsing the node tree which we already have
// available as a node tree. So we recommend to stick with the former version.

1.3.4. The Other Case

What if neither of these cases works with a given application structure, and a string-to-string operation is required? In this case, the developer is free to take the sanitization result and remove it from its context. In this case, the responsibility to prevent mXSS-class attacks that stem from mis-applying those strings in an inappropriate context remains with the developer.

const user_string = "...";  // The user string.
const sanitizer = new Sanitizer( ... );  // Our Sanitizer.

// The developer plans to insert this string into a <div> element, but has to
// keep this around as a string (instead of an element). It’s important that
// the developer remembers the parsing context and MUST NOT use this in a
// different parsing context in order to prevent mXSS attacks.
const sanitized_for_div = sanitizer.sanitizeFor("div", user_string).innerHTML;

2. Framework

2.1. Sanitizer API

The core API is the Sanitizer object and the sanitize method. Sanitizers can be instantiated using an optional SanitizerConfig dictionary for options. The most common use-case - preventing XSS - is handled by default, so that creating a Sanitizer with a custom config is necessary only to handle additional, application-specific use cases.

[
  Exposed=(Window),
  SecureContext
] interface Sanitizer {
  constructor(optional SanitizerConfig config = {});

  DocumentFragment sanitize((Document or DocumentFragment) input);
  Element? sanitizeFor(DOMString element, DOMString input);

  SanitizerConfig getConfiguration();
  static SanitizerConfig getDefaultConfiguration();
};

The Element interface gains an additional method, setHTML which applies a string using a Sanitizer directly to an existing element node.

dictionary SetHTMLOptions {
  Sanitizer sanitizer;
};
[SecureContext]
partial interface Element {
  undefined setHTML(DOMString input, optional SetHTMLOptions options = {});
};
Tests

Is this how we specify a method on existing class "owned" by a different spe?

// To make our examples easy to follow, we’ll need a way create DOM nodes.
// The following is hacky way to accomplish this, for illustration only,
// that you shall pretty please not use in practice. This parsing method can
// cause side-effects based on the string being parsed, which is insecure.
// In fact, this very API exists for the sole purpose of preventing the
// problems that this approach has.
//
// But... for our examples we’ll need something that is quick and easy, since
// we cannot use our own Sanitizer API to explain our own Sanitizer API.
const to_node = str => document.createRange().createContextualFragment(str);

// The core API of the Sanitizer is the .sanitize method:
let untrusted_input = to_node("Hello!");
const sanitizer = new Sanitizer();
sanitizer.sanitize(untrusted_input);  // DocumentFragment w/ a text node, "Hello!"

// Probably we want to put this somewhere in our DOM:
element.replaceChildren(sanitizer.sanitize(untrusted_input));

// If our input contains markup it’ll be mostly preserved, except for
// script-y markup:
untrusted_input = to_node("<em onclick='alert(1);'>Hello!</em>");
sanitizer.sanitize(untrusted_input);  // <em>Hello!</em>
element.replaceChildren(sanitizer.sanitize(untrusted_input));  // No alert!

// The .sanitize method is the primary API, and returns a DocumentFragment.
// The .sanitizeFor method accepts and parses a string and returns an HTML
// element node.
const hello = to_node("hello");
(sanitizer.sanitize(hello)) instanceof DocumentFragment;  // true
(sanitizer.sanitizeFor("template", "hello")) instanceof HTMLTemplateElement;  // true

2.2. String Handling

Parsing (and unparsing) strings to (or from) HTML requires a context element. Thus, the sanitizeFor method requires us to pass in a context, which the implementation can then hand over to the HTML Parser.

Additionally, the Element interface gains a setHTML method, which always knows the correct context, because it is applied to a given Element instance. This Element is the correct context for both parsing and unparsing its own content.

One way to conceptualize this is to view string sanitization as a three step operation: 1, parsing the string; 2, sanitizing the resulting node tree; and 3, grafting the resulting subtree onto our live DOM. Sanitizer.sanitize is the middle step. Sanitizer.sanitizeFor performs the first and second steps, but leaves the third to the developer. Element.setHTML does all three. Which to use depends on the structure of your application, whether you can do all three steps simultaneously, or whether maybe the sanitization is removed (in either code structure or point in time) from the eventual modification of the DOM.

// If the markup to be sanitized is already available as a tree, for example
// from an embedded frame, one can use sanitize:
document.getElementById("target").replaceChildren(
  sanitizer.sanitize(
    document.querySelector("iframe#myframe").contentWindow.document));

// If the markup to be sanitized is present in string form, but we already
// have the element we want to insert in available:
const untrusted_input = "....";
document.getElementById("someelement").setHTML(
  untrusted_input, {sanitizer: sanitizer});

// Same as above, but using the default Sanitizer configuration:
document.getElementById("somelement").setHTML(untrusted_input);

// If the markup to be sanitized is present in string form, but we don’t want
// to do the DOM insertion now:
let no_xss = sanitizer.sanitizeFor("div", untrusted_input);
// ... much later ...
document.querySelector("div#targetdiv").replaceChildren(...no_xss.childNodes);

// Note that parsing HTML depends on the current context in many ways, some
// subtle, some not so much. Supplying a different context than what the
// result will eventually be used in has both security and functional risks.
// It’s up to the developer to handle this safely.
//
// Example: Most, many parsing contexts disallow table  data (<td>) without
//          an enclosing table.
sanitizer.sanitizeFor("div", "<td>data</td>").innerHTML  // "data"
sanitizer.sanitizeFor("table", "<td>data</td>").innerHTML  // "<td>data</td>"
Note: Sanitizing a string will use the HTML Parser to parse the input, which will perform some degree of normalization. So even if no sanitization steps are taken on a particular input, it cannot be guaranteed that the output of .sanitizeFor will be character-for-character identical to the input.
sanitizer.sanitizeFor("div", "Stra&szlig;e")  // Straße
sanitizer.sanitizeFor("div", "<image>")  // <img>
Note: Sanitizer.sanitizeFor and Element.setHTML can replace the respective other. Both are provided since they support different use cases.
// sanitizeFor, based on SetInnerHTML.
function sanitizeFor(element, input) {
  const elem = document.createElement(element);
  elem.setHTML(input, {sanitizer: this});
  return elem;
}

// setHTML, based on sanitizeFor.
function setHTML(input, options) {
  const sanitizer = options?.sanitizer ?? new Sanitizer();
  this.replaceChildren(...sanitizer.sanitizeFor(this.localName, input).childNodes);
}

2.3. The Configuration Dictionary

The Sanitizer’s configuration dictionary is a dictionary which describes modifications to the sanitize operation. If a Sanitizer has not received an explicit configuration, for example when being constructed without any parameters, then the default configuration value is used as the configuration dictionary.

dictionary SanitizerConfig {
  sequence<DOMString> allowElements;
  sequence<DOMString> blockElements;
  sequence<DOMString> dropElements;
  AttributeMatchList allowAttributes;
  AttributeMatchList dropAttributes;
  boolean allowCustomElements;
  boolean allowUnknownMarkup;
  boolean allowComments;
};
allowElements

The element allow list is a sequence of strings with elements that the sanitizer should retain in the input.

blockElements

The element block list is a sequence of strings with elements where the sanitizer should remove the elements from the input, but retain their children.

dropElements

The element drop list is a sequence of strings with elements that the sanitizer should remove from the input, including its children.

allowAttributes

The attribute allow list is an attribute match list, which determines whether an attribute (on a given element) should be allowed.

dropAttributes

The attribute drop list is an attribute match list, which determines whether an attribute (on a given element) should be dropped.

allowCustomElements

The allow custom elements option determines whether custom elements are to be considered. The default is to drop them. If this option is true, custom elements will still be checked against all other built-in or configured checks.

allowUnknownMarkup

The allow unknown markup option determines whether unknown HTML elements are to be considered. The default is to drop them. If this option is true, unkown HTML elements will still be checked against all other built-in or configured checks.

allowComments

The allow comments option determines whether HTML comments are allowed.

Note: allowElements creates a sanitizer that defaults to dropping elements, while blockElements and dropElements defaults to keeping unknown elements. Using both types is possible, but is probably of little practical use. The same applies to allowAttributes and dropAttributes.

const sample = to_node("Some text <b><i>with</i></b> <blink>tags</blink>.");
const script_sample = to_node("abc <script>alert(1)</script> def");

// Some text <b>with</b> text tags.
new Sanitizer({allowElements: [ "b" ]}).sanitize(sample);

// Some text <i>with</i> <blink>tags</blink>.
new Sanitizer({blockElements: [ "b" ]}).sanitize(sample);

// Some text <blink>tags</blink>.
new Sanitizer({dropElements: [ "b" ]}).sanitize(sample);

// Note: The default configuration handles XSS-relevant input:

// Non-scripting input will be passed through:
new Sanitizer().sanitize(sample);  // Will output sample unmodified.

// Scripts will be blocked: "abc alert(1) def"
new Sanitizer().sanitize(script_sample);

In addition to allow and block lists for elements and attributes, there are also options to configure some node or element types.

Examples:

// Comments will be dropped by default.
const comment = to_node("Hello  World!");
new Sanitizer().sanitize(comment);  // "Hello  World!"
new Sanitizer({allowComments: true}).sanitize(comment);  // Same as comment.

A sanitizer’s configuration can be queried using the query the sanitizer config method.

// Does the default config allow script elements?
Sanitizer.getDefaultConfiguration().allowElements.includes("script")  // false

// We found a Sanitizer instance. Does it have an allow-list configured?
const a_sanitizer = ...;
!!a_sanitizer.getConfiguration().allowElements // true, if an allowElements list is configured

// If it does have an allow elements list, does it include the <div> element?
a_sanitizer.getConfiguration().allowElements.includes("div")  // true, if "div" is in allowElements.

// Note that the getConfiguration method might do some normalization. E.g., it won’t
// contain key/value pairs that are not declare in the IDL.
Object.keys(new Sanitizer({madeUpDictionaryKey: "Hello"}).getConfiguration())  // []

// As a Sanitizer’s config describes its operation, a new sanitizer with
// another instance’s configuration should behave identically.
// (For illustration purposes only. It would make more sense to just use a directly.)
const a = /* ... a Sanitizer we found somewhere ... */;
const b = new Sanitizer(a.getConfiguration());  // b should behave the same as a.

// getDefaultConfiguration() and new Sanitizer().getConfiguration should be the same.
// (For illustration purposes only. There are better ways of implementing
// object equality in JavaScript.)
JSON.stringify(Sanitizer.getDefaultConfiguration()) == JSON.stringify(new Sanitizer().getConfiguration());  // true

2.3.1. Attribute Match Lists

An attribute match list is a map of attributes to elements, where the special name "*" stands for all attributes or elements. A given attribute belonging to an element matches an attribute match list, if the attribute is a key in the match list, and element or "*" are found in the attribute’s value list.

Element names are interpreted as names in the [[HTML namespace]] and non-namespaced attributes - i.e., what one may think of as normal [HTML] elements and attributes. Elements are named by their local name, and attributes, too.

typedef record<DOMString, sequence<DOMString>> AttributeMatchList;
Examples for attributes and attribute match lists:
const sample = to_node("<span id='span1' class='theclass' style='font-weight: bold'>hello</span>");

// Allow only <span style>: <span style='font-weight: bold'>...</span>
new Sanitizer({allowAttributes: {"style": ["span"]}}).sanitize(sample);

// Allow style, but not on span: <span>...</span>
new Sanitizer({allowAttributes: {"style": ["div"]}}).sanitize(sample);

// Allow style on any elements: <span style='font-weight: bold'>...</span>
new Sanitizer({allowAttributes: {"style": ["*"]}}).sanitize(sample);

// Drop <span id>: <span class='theclass' style='font-weight: bold'>...</span>
new Sanitizer({dropAttributes: {"id": ["span"]}}).sanitize(sample);

// Drop id, everywhere: <span class='theclass' style='font-weight: bold'>...</span>
new Sanitizer({dropAttributes: {"id": ["*"]}}).sanitize(sample);

3. Algorithms

3.1. API Implementation

To create a Sanitizer with an optional config parameter, run these steps:
  1. Create a copy of config.

  2. Set config as this's configuration dictionary.

This should explicitly state the config’s properties in which element names are found and modify the config wih map operations. [Issue #148]

Note: The configuration object contains element names in the element allow list, element block list, and element drop list, and in the mapped values in the attribute allow list and attribute drop list.

To sanitize a given input of type Document or DocumentFragment run these steps:
  1. Let fragment be the result of running the create a document fragment algorithm on input.

  2. Run the sanitize a document fragment algorithm on fragment.

  3. Return fragment.

Tests

The sanitize algorithm does not need to run "create a document fragment". [Issue #149]

To sanitize for an element name of type DOMString and a given input of type DOMString run these steps:
  1. Let element be an HTML element created by running the steps of the creating an element algorithm with the current document, element name, the HTML namespace, and no optional parameters.

  2. If the element kind of element is regular and if the baseline element allow list does not contain element name, then return null.

  3. Let fragment be the result of invoking the html fragment parsing algorithm, with element as the context element and input as markup.

  4. Run the steps of the sanitize a document fragment algorithm on fragment.

  5. Replace all with fragment as the node and element as the parent.

  6. Return element.

Tests

Does the .sanitizeFor element name require namespace-related processing? [Issue #140]

To sanitize and set a value using an SetHTMLOptions options dictionary on an Element node this, run these steps:
  1. If the element kind of this is regular and this' local name does not match any name in the baseline element allow list, then throw a TypeError and return.

  2. If the sanitizer member exists in the options SetHTMLOptions dictionary,

    1. then let sanitizer be the value of the sanitizer member of the options SetHTMLOptions dictionary,

    2. otherwise let sanitizer be the result of the create a Sanitizer algorithm without a config parameter.

  3. Let fragment be the result of invoking the html fragment parsing algorithm with this as the context node and value as markup.

  4. Run the steps if the sanitize a document fragment algorithm on fragment, using sanitizer as the current Sanitizer instance.

  5. Replace all with fragment as the node and this as the parent.

Tests
To query the sanitizer config of a given sanitizer instance, run these steps:
  1. Let sanitizer be the current Sanitizer.

  2. Let config be sanitizer’s configuration dictionary, or the default configuration if no configuration dictionary was given.

  3. Let result be a newly constructed SanitizerConfig dictionary.

  4. For any non-empty member of config whose key is declared in SanitizerConfig, copy the value to result.

  5. Return result.

Tests

IDL is taking care of most steps in "query the sanitizer config". Clean up. [Issue #150]

3.2. Helper Definitions

To create a document fragment named fragment from an input of type Document or DocumentFragment, run these steps:
  1. Let node be null.

  2. Switch based on input’s type:

    1. If input is of type DocumentFragment, then:

      1. Set node to input.

    2. If input is of type Document, then:

      1. Set node to input’s documentElement.

  3. Let clone be the result of running clone a node on node with the clone children flag set.

  4. Let fragment be a new DocumentFragment whose node document is node’s node document.

  5. Append the node clone to fragment.

  6. Return fragment.

3.3. Sanitization Algorithms

To sanitize a document fragment named fragment with a Sanitizer sanitizer run these steps:
  1. Let m be a map that maps nodes to a sanitize action.

  2. Let nodes be a list containing the inclusive descendants of fragment, in tree order.

  3. For each node in nodes:

    1. Let action be the result of running the sanitize a node algorithm on node with sanitizer.

    2. Set m[node] to action.

  4. For each node in nodes:

    1. If m[node] is drop, remove node.

    2. If m[node] is block, create a DocumentFragment fragment, append all of node’s children to fragment, and replace node within node’s parent with fragment.

    3. If m[node] is keep, do nothing.

The step above needs to explicitly iterate over the children and insert into parent. It could collect them in a variable or do things in place, but this is a bit too imprecise. [Issue #156]

To sanitize a node named node with sanitizer run these steps:
  1. Assert: node is not a Document or DocumentFragment or Attr or DocumentType node.

  2. If node is an element node:

    1. Let element be node.

    2. For each attr in element’s attribute list:

      1. Let attr action be the result of running the sanitize action for an attribute algorithm on attr and element.

      2. If attr action is different from keep, remove an attribute supplying attr.

    3. Run the steps to handle funky elements on element.

    4. Let action be the result of running the sanitize action for an element on element.

    5. Return action.

  3. If node is a Comment node:

    1. Let config be sanitizer’s configuration dictionary, or the default configuration if no configuration dictionary was given.

    2. If config’s allow comments option exists and |config|[allowComments] is true: Return keep.

    3. Return drop.

  4. If node is a Text node: Return keep.

  5. Assert: node is a ProcessingInstruction

  6. Return drop.

The sanitize action for an attribute algorithm parameters do not match. Issue(153): consider creating an effective sanitizer config. Also, IDL guarantees that a config is ALWAYS given. The question is really whether the members exists. [Issue #151]

Some HTML elements require special treatment in a way that can’t be easily expressed in terms of configuration options or other algorithms. The following algorithm collects these in one place.

To handle funky elements on a given element, run these steps:
  1. If element’s namespace is HTML and the local name is "template":

    1. Run the steps of the sanitize a document fragment algorithm on element’s template contents attribute.

    2. Drop all child nodes of element.

  2. If element’s namespace is HTML and the local name is one of "a" or "area", and if element’s protocol property is "javascript:":

    1. Remove the href attribute from element.

  3. If element’s namespace is HTML and the local name is "form" and if element’s action attribute is a [URL] with javascript: protocol:

    1. Remove the action attribute from element.

  4. If element’s namespace is HTML and the local name is "input" or "button", and if element’s formaction attribute is a [URL] with javascript: protocol

    1. Remove the formaction attribute from element.

Export and refer funky element properties more precisely. [Issue #154]

3.4. Matching Against The Configuration

A sanitize action is keep, drop, or block.

To determine the sanitize action for an element, given a SanitizerConfig config, run these steps:
  1. Let kind be element’s element kind.

  2. If kind is regular and element does not match any name in the baseline element allow list: Return drop.

  3. If kind is custom and if config["allowCustomElements"] does not exist or if config["allowCustomElements"] is false: Return drop.

  4. If kind is unknown and if config["allowUnknownMarkup"] does not exist or it config["allowUnknownMarkup"] is false: Return drop.

  5. If element matches any name in config["dropElements"]: Return drop.

  6. If element matches any name in config["blockElements"]: Return block.

  7. Let allow list be null.

  8. If "allowElements" exists in config:

    1. Then : Set allow list to config["allowElements"].

    2. Otherwise: Set allow list to the default configuration's element allow list.

  9. If element does not match any name in allow list: Return block.

  10. Return keep.

Tests
To determine whether an element matches an element name, run these steps:
  1. If element is in the HTML namespace and if element’s local name is identical to name: Return true.

  2. Return false.

Whitespaces or colons? [Issue #146]

To determine whether an attribute matches an attribute match list list, run these steps:
  1. If attribute’s namespace is not null: Return false.

  2. If attribute’s local name does not match the attribute match list list’s key and if the key is not "*": Return false.

  3. Let element be the attribute’s Element.

  4. Let element name be element’s local name.

  5. If list’s value does not contain element name and value is not ["*"]: Return false.

  6. Return true.

To determine the sanitize action for an attribute given a Sanitizer configuration dictionary config, run these steps:
  1. Let kind be attribute’s attribute kind.

  2. If kind is unknown and if config["allowUnknownMarkup"] does not exist or it config["allowUnknownMarkup"] is false: Return drop.

  3. If kind is regular and attribute’s local name does not match any name in the baseline attribute allow list: Return drop.

  4. If attribute matches any attribute match list in config’s attribute drop list: Return drop.

  5. If attribute allow list exists in config:

    1. Then let allow list be |config|["allowAttributes"].

    2. Otherwise: Let allow list be the default configuration's attribute allow list.

  6. If attribute does not match any attribute match list in allow list: Return drop.

  7. Return keep.

The element kind of an element is one of regular, unknown, or custom. Let element kind be:

We do not want to use the interface (e.g., "applet" and "blink" are HTMLUnknownElement) [Issue #147]

  • regular, otherwise.

Similarly, the attribute kind of an attribute is one of regular or unknown. Let attribute kind be:
  • unknown, if the [HTML] specification does not assign any meaning to attribute’s name.

Again, this needs to be more specific. Historical, obsolete, conforming, non-conforming (e.g. bgcolor). It is desirable we make a sanitizer-specific list. [Issue #147]

  • regular, otherwise.

3.5. Baseline and Defaults

The sanitizer baseline and defaults need to be carefully vetted, and are still under discussion. The values below are for illustrative purposes only.

The sanitizer has a built-in default configuration, which is stricter than the baseline and aims to eliminate any script-injection possibility, as well as legacy or unusual constructs.

The defaults and baseline are defined by three JSON constants, baseline element allow list, baseline attribute allow list, default configuration. For better readability, these have been moved to an appendix A.

4. Security Considerations

The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting by traversing a supplied HTML content and removing elements and attributes according to a configuration. The specified API must not support the construction of a Sanitizer object that leaves script-capable markup in and doing so would be a bug in the threat model.

That being said, there are security issues which the correct usage of the Sanitizer API will not be able to protect against and the scenarios will be laid out in the following sections.

4.1. Server-Side Reflected and Stored XSS

This section is not normative.

The Sanitizer API operates solely in the DOM and adds a capability to traverse and filter an existing DocumentFragment. The Sanitizer does not address server-side reflected or stored XSS.

4.2. DOM clobbering

This section is not normative.

DOM clobbering describes an attack in which malicious HTML confuses an application by naming elements through id or name attributes such that properties like children of an HTML element in the DOM are overshadowed by the malicious content.

The Sanitizer API does not protect DOM clobbering attacks in its default state, but can be configured to remove id and name attributes.

4.3. XSS with Script gadgets

This section is not normative.

Script gadgets are a technique in which an attacker uses existing application code from popular JavaScript libraries to cause their own code to execute. This is often done by injecting innocent-looking code or seemingly inert DOM nodes that is only parsed and interpreted by a framework which then performs the execution of JavaScript based on that input.

The Sanitizer API can not prevent these attacks, but requires page authors to explicitly allow unknown elements in general, and authors must additionally explicitly configure unknown attributes and elements and markup that is known to be widely used for templating and framework-specific code, like data- and slot attributes and elements like <slot> and <template>. We believe that these restrictions are not exhaustive and encourage page authors to examine their third party libraries for this behavior.

4.4. Mutated XSS

This section is not normative.

Mutated XSS or mXSS describes an attack based on parser context mismatches when parsing an HTML snippet without the correct context. In particular, when a parsed HTML fragment has been serialized to a string, the string is not guaranteed to be parsed and interpreted exactly the same when inserted into a different parent element. An example for carrying out such an attack is by relying on the change of parsing behavior for foreign content or misnested tags.

The Sanitizer API offers help against Mutated XSS, but relies on some amount of cooperation by the developers. The sanitize() function does not handle strings and is therefore unaffected. The setHTML function combines sanitization with DOM modification and can implicitly apply the correct context. The sanitizeFor() function combines parsing and sanitization, and relies on the developer to supply the correct context for the eventual application of its result.

If the data to be sanitized is available as a node tree, we encourage authors to use the sanitize() function of the API which returns a DocumentFragment and avoids risks that come with serialization and additional parsing. Directly operating on a fragment after sanitization also comes with a performance benefit, as the cost of additional serialization and parsing is avoided.

A more complete treatement of mXSS can be found in [MXSS].

5. Acknowledgements

Cure53’s [DOMPURIFY] is a clear inspiration for the API this document describes, as is Internet Explorer’s window.toStaticHTML().

Appendix A: Built-in Constants

This appendix is normative, except where explicitly noted otherwise.

These constants define core behaviour of the Sanitizer algorithm.

Built-ins Justification

This subsection is super duper non-normative.

Note: The normative values of these constants are found below. The derivation of these are explained here, with an implementation in the [DEFAULTS] script. It is expected that these values will change before this specification is finalized. Also, we expect these to be updated to include additional HTML elements as they are introduced in user agents.

For the purpose of this Sanitizer API, [HTML] constructs fall into one of four classes, where the first defines the baseline, and the first, second, plus the third define the default:

  1. Elements and attributes that (directly) execute script. In other words, elements and attributes that are unconditionally script-ish.

  2. Legacy and "difficult" elements and attributes. Examples are the <plaintext> <xmp> and elements, which have special parsing rules attached to them. These are not dangerous _per se_, but they have contributed to existing vulnerability.

  3. Elements and attributes that we feel rarely make sense in user-supplied content.

  4. All the rest.

Specifically:

  1. Script-ish constructs:

    • The HTMLScriptElement, which proudly executes script as its sole purpose.

    • All event handler attributes, since these also execute script.

    • HTMLIFrameElement, which loads arbitrary HTML content and therefor also script.

    • The legacy HTMLObjectElement and HTMLEmbedElement, which load non-HTML active content. Also, <object>’s side-kick HTMLParamElement.

    • The no-longer conforming <frame>, <frameset>, and <applet> tags, which are outdated versions companions of several elements listed above.

    • The <noscript>, <noframes>, <noembed>, and <nolayer> elements. These, by themselves, are arguably not script-ish, but they are companions to elements listed above, and make no sense on their own.

    • Also, the HTMLBaseElement, as this effectively modifies interpretation of other URLs.

  2. Legacy and "difficult" elements.

    • Special parsing behaviour. This is not dangerous in its own right, but has contributed to mXSS-style attacks. This includes:

      • <plaintext> (Which parses in PLAINTEXT state.)

      • <title> and <textarea> (Which parse in RCDATA state.)

      • The non-conforming [<xmp>](https://html.spec.whatwg.org/#xmp) element.

    • Legacy elements:

      • <image> ([which is parsed as <img>](https://html.spec.whatwg.org/#parsing-main-inbody)).

      • <basefont>

  3. Constructs unlikely to be beneficial in user-supplied content:

The Baseline Element Allow List

The built-in baseline element allow list has the following value:

[
  "a",
  "abbr",
  "acronym",
  "address",
  "area",
  "article",
  "aside",
  "audio",
  "b",
  "basefont",
  "bdi",
  "bdo",
  "bgsound",
  "big",
  "blockquote",
  "body",
  "br",
  "button",
  "canvas",
  "caption",
  "center",
  "cite",
  "code",
  "col",
  "colgroup",
  "command",
  "data",
  "datalist",
  "dd",
  "del",
  "details",
  "dfn",
  "dialog",
  "dir",
  "div",
  "dl",
  "dt",
  "em",
  "fieldset",
  "figcaption",
  "figure",
  "font",
  "footer",
  "form",
  "h1",
  "h2",
  "h3",
  "h4",
  "h5",
  "h6",
  "head",
  "header",
  "hgroup",
  "hr",
  "html",
  "i",
  "image",
  "img",
  "input",
  "ins",
  "kbd",
  "keygen",
  "label",
  "layer",
  "legend",
  "li",
  "link",
  "listing",
  "main",
  "map",
  "mark",
  "marquee",
  "menu",
  "meta",
  "meter",
  "nav",
  "nobr",
  "ol",
  "optgroup",
  "option",
  "output",
  "p",
  "picture",
  "plaintext",
  "popup",
  "portal",
  "pre",
  "progress",
  "q",
  "rb",
  "rp",
  "rt",
  "rtc",
  "ruby",
  "s",
  "samp",
  "section",
  "select",
  "selectmenu",
  "slot",
  "small",
  "source",
  "span",
  "strike",
  "strong",
  "style",
  "sub",
  "summary",
  "sup",
  "table",
  "tbody",
  "td",
  "template",
  "textarea",
  "tfoot",
  "th",
  "thead",
  "time",
  "title",
  "tr",
  "track",
  "tt",
  "u",
  "ul",
  "var",
  "video",
  "wbr",
  "xmp"
]

The Baseline Attribute Allow List

The baseline attribute allow list has the following value:

[
  "abbr",
  "accept",
  "accept-charset",
  "accesskey",
  "action",
  "align",
  "alink",
  "allow",
  "allowfullscreen",
  "allowpaymentrequest",
  "alt",
  "anchor",
  "archive",
  "as",
  "async",
  "autocapitalize",
  "autocomplete",
  "autocorrect",
  "autofocus",
  "autopictureinpicture",
  "autoplay",
  "axis",
  "background",
  "behavior",
  "bgcolor",
  "border",
  "bordercolor",
  "capture",
  "cellpadding",
  "cellspacing",
  "challenge",
  "char",
  "charoff",
  "charset",
  "checked",
  "cite",
  "class",
  "classid",
  "clear",
  "code",
  "codebase",
  "codetype",
  "color",
  "cols",
  "colspan",
  "compact",
  "content",
  "contenteditable",
  "controls",
  "controlslist",
  "conversiondestination",
  "coords",
  "crossorigin",
  "csp",
  "data",
  "datetime",
  "declare",
  "decoding",
  "default",
  "defer",
  "dir",
  "direction",
  "dirname",
  "disabled",
  "disablepictureinpicture",
  "disableremoteplayback",
  "disallowdocumentaccess",
  "download",
  "draggable",
  "elementtiming",
  "enctype",
  "end",
  "enterkeyhint",
  "event",
  "exportparts",
  "face",
  "for",
  "form",
  "formaction",
  "formenctype",
  "formmethod",
  "formnovalidate",
  "formtarget",
  "frame",
  "frameborder",
  "headers",
  "height",
  "hidden",
  "high",
  "href",
  "hreflang",
  "hreftranslate",
  "hspace",
  "http-equiv",
  "id",
  "imagesizes",
  "imagesrcset",
  "importance",
  "impressiondata",
  "impressionexpiry",
  "incremental",
  "inert",
  "inputmode",
  "integrity",
  "invisible",
  "is",
  "ismap",
  "keytype",
  "kind",
  "label",
  "lang",
  "language",
  "latencyhint",
  "leftmargin",
  "link",
  "list",
  "loading",
  "longdesc",
  "loop",
  "low",
  "lowsrc",
  "manifest",
  "marginheight",
  "marginwidth",
  "max",
  "maxlength",
  "mayscript",
  "media",
  "method",
  "min",
  "minlength",
  "multiple",
  "muted",
  "name",
  "nohref",
  "nomodule",
  "nonce",
  "noresize",
  "noshade",
  "novalidate",
  "nowrap",
  "object",
  "open",
  "optimum",
  "part",
  "pattern",
  "ping",
  "placeholder",
  "playsinline",
  "policy",
  "poster",
  "preload",
  "pseudo",
  "readonly",
  "referrerpolicy",
  "rel",
  "reportingorigin",
  "required",
  "resources",
  "rev",
  "reversed",
  "role",
  "rows",
  "rowspan",
  "rules",
  "sandbox",
  "scheme",
  "scope",
  "scopes",
  "scrollamount",
  "scrolldelay",
  "scrolling",
  "select",
  "selected",
  "shadowroot",
  "shadowrootdelegatesfocus",
  "shape",
  "size",
  "sizes",
  "slot",
  "span",
  "spellcheck",
  "src",
  "srcdoc",
  "srclang",
  "srcset",
  "standby",
  "start",
  "step",
  "style",
  "summary",
  "tabindex",
  "target",
  "text",
  "title",
  "topmargin",
  "translate",
  "truespeed",
  "trusttoken",
  "type",
  "usemap",
  "valign",
  "value",
  "valuetype",
  "version",
  "virtualkeyboardpolicy",
  "vlink",
  "vspace",
  "webkitdirectory",
  "width",
  "wrap"
]

The Default Configuration Dictionary

The built-in default configuration has the following value:

{
  "allowCustomElements": false,
  "allowUnknownMarkup": false,
  "allowElements": [
    "a",
    "abbr",
    "acronym",
    "address",
    "area",
    "article",
    "aside",
    "audio",
    "b",
    "bdi",
    "bdo",
    "bgsound",
    "big",
    "blockquote",
    "body",
    "br",
    "button",
    "canvas",
    "caption",
    "center",
    "cite",
    "code",
    "col",
    "colgroup",
    "datalist",
    "dd",
    "del",
    "details",
    "dfn",
    "dialog",
    "dir",
    "div",
    "dl",
    "dt",
    "em",
    "fieldset",
    "figcaption",
    "figure",
    "font",
    "footer",
    "form",
    "h1",
    "h2",
    "h3",
    "h4",
    "h5",
    "h6",
    "head",
    "header",
    "hgroup",
    "hr",
    "html",
    "i",
    "img",
    "input",
    "ins",
    "kbd",
    "keygen",
    "label",
    "layer",
    "legend",
    "li",
    "link",
    "listing",
    "main",
    "map",
    "mark",
    "marquee",
    "menu",
    "meta",
    "meter",
    "nav",
    "nobr",
    "ol",
    "optgroup",
    "option",
    "output",
    "p",
    "picture",
    "popup",
    "pre",
    "progress",
    "q",
    "rb",
    "rp",
    "rt",
    "rtc",
    "ruby",
    "s",
    "samp",
    "section",
    "select",
    "selectmenu",
    "small",
    "source",
    "span",
    "strike",
    "strong",
    "style",
    "sub",
    "summary",
    "sup",
    "table",
    "tbody",
    "td",
    "tfoot",
    "th",
    "thead",
    "time",
    "tr",
    "track",
    "tt",
    "u",
    "ul",
    "var",
    "video",
    "wbr"
  ],
  "allowAttributes": {
    "abbr": [
      "*"
    ],
    "accept": [
      "*"
    ],
    "accept-charset": [
      "*"
    ],
    "accesskey": [
      "*"
    ],
    "action": [
      "*"
    ],
    "align": [
      "*"
    ],
    "alink": [
      "*"
    ],
    "allow": [
      "*"
    ],
    "allowfullscreen": [
      "*"
    ],
    "alt": [
      "*"
    ],
    "anchor": [
      "*"
    ],
    "archive": [
      "*"
    ],
    "as": [
      "*"
    ],
    "async": [
      "*"
    ],
    "autocapitalize": [
      "*"
    ],
    "autocomplete": [
      "*"
    ],
    "autocorrect": [
      "*"
    ],
    "autofocus": [
      "*"
    ],
    "autopictureinpicture": [
      "*"
    ],
    "autoplay": [
      "*"
    ],
    "axis": [
      "*"
    ],
    "background": [
      "*"
    ],
    "behavior": [
      "*"
    ],
    "bgcolor": [
      "*"
    ],
    "border": [
      "*"
    ],
    "bordercolor": [
      "*"
    ],
    "capture": [
      "*"
    ],
    "cellpadding": [
      "*"
    ],
    "cellspacing": [
      "*"
    ],
    "challenge": [
      "*"
    ],
    "char": [
      "*"
    ],
    "charoff": [
      "*"
    ],
    "charset": [
      "*"
    ],
    "checked": [
      "*"
    ],
    "cite": [
      "*"
    ],
    "class": [
      "*"
    ],
    "classid": [
      "*"
    ],
    "clear": [
      "*"
    ],
    "code": [
      "*"
    ],
    "codebase": [
      "*"
    ],
    "codetype": [
      "*"
    ],
    "color": [
      "*"
    ],
    "cols": [
      "*"
    ],
    "colspan": [
      "*"
    ],
    "compact": [
      "*"
    ],
    "content": [
      "*"
    ],
    "contenteditable": [
      "*"
    ],
    "controls": [
      "*"
    ],
    "controlslist": [
      "*"
    ],
    "conversiondestination": [
      "*"
    ],
    "coords": [
      "*"
    ],
    "crossorigin": [
      "*"
    ],
    "csp": [
      "*"
    ],
    "data": [
      "*"
    ],
    "datetime": [
      "*"
    ],
    "declare": [
      "*"
    ],
    "decoding": [
      "*"
    ],
    "default": [
      "*"
    ],
    "defer": [
      "*"
    ],
    "dir": [
      "*"
    ],
    "direction": [
      "*"
    ],
    "dirname": [
      "*"
    ],
    "disabled": [
      "*"
    ],
    "disablepictureinpicture": [
      "*"
    ],
    "disableremoteplayback": [
      "*"
    ],
    "disallowdocumentaccess": [
      "*"
    ],
    "download": [
      "*"
    ],
    "draggable": [
      "*"
    ],
    "elementtiming": [
      "*"
    ],
    "enctype": [
      "*"
    ],
    "end": [
      "*"
    ],
    "enterkeyhint": [
      "*"
    ],
    "event": [
      "*"
    ],
    "exportparts": [
      "*"
    ],
    "face": [
      "*"
    ],
    "for": [
      "*"
    ],
    "form": [
      "*"
    ],
    "formaction": [
      "*"
    ],
    "formenctype": [
      "*"
    ],
    "formmethod": [
      "*"
    ],
    "formnovalidate": [
      "*"
    ],
    "formtarget": [
      "*"
    ],
    "frame": [
      "*"
    ],
    "frameborder": [
      "*"
    ],
    "headers": [
      "*"
    ],
    "height": [
      "*"
    ],
    "hidden": [
      "*"
    ],
    "high": [
      "*"
    ],
    "href": [
      "*"
    ],
    "hreflang": [
      "*"
    ],
    "hreftranslate": [
      "*"
    ],
    "hspace": [
      "*"
    ],
    "http-equiv": [
      "*"
    ],
    "id": [
      "*"
    ],
    "imagesizes": [
      "*"
    ],
    "imagesrcset": [
      "*"
    ],
    "importance": [
      "*"
    ],
    "impressiondata": [
      "*"
    ],
    "impressionexpiry": [
      "*"
    ],
    "incremental": [
      "*"
    ],
    "inert": [
      "*"
    ],
    "inputmode": [
      "*"
    ],
    "integrity": [
      "*"
    ],
    "invisible": [
      "*"
    ],
    "is": [
      "*"
    ],
    "ismap": [
      "*"
    ],
    "keytype": [
      "*"
    ],
    "kind": [
      "*"
    ],
    "label": [
      "*"
    ],
    "lang": [
      "*"
    ],
    "language": [
      "*"
    ],
    "latencyhint": [
      "*"
    ],
    "leftmargin": [
      "*"
    ],
    "link": [
      "*"
    ],
    "list": [
      "*"
    ],
    "loading": [
      "*"
    ],
    "longdesc": [
      "*"
    ],
    "loop": [
      "*"
    ],
    "low": [
      "*"
    ],
    "lowsrc": [
      "*"
    ],
    "manifest": [
      "*"
    ],
    "marginheight": [
      "*"
    ],
    "marginwidth": [
      "*"
    ],
    "max": [
      "*"
    ],
    "maxlength": [
      "*"
    ],
    "mayscript": [
      "*"
    ],
    "media": [
      "*"
    ],
    "method": [
      "*"
    ],
    "min": [
      "*"
    ],
    "minlength": [
      "*"
    ],
    "multiple": [
      "*"
    ],
    "muted": [
      "*"
    ],
    "name": [
      "*"
    ],
    "nohref": [
      "*"
    ],
    "nomodule": [
      "*"
    ],
    "nonce": [
      "*"
    ],
    "noresize": [
      "*"
    ],
    "noshade": [
      "*"
    ],
    "novalidate": [
      "*"
    ],
    "nowrap": [
      "*"
    ],
    "object": [
      "*"
    ],
    "open": [
      "*"
    ],
    "optimum": [
      "*"
    ],
    "part": [
      "*"
    ],
    "pattern": [
      "*"
    ],
    "ping": [
      "*"
    ],
    "placeholder": [
      "*"
    ],
    "playsinline": [
      "*"
    ],
    "policy": [
      "*"
    ],
    "poster": [
      "*"
    ],
    "preload": [
      "*"
    ],
    "pseudo": [
      "*"
    ],
    "readonly": [
      "*"
    ],
    "referrerpolicy": [
      "*"
    ],
    "rel": [
      "*"
    ],
    "reportingorigin": [
      "*"
    ],
    "required": [
      "*"
    ],
    "resources": [
      "*"
    ],
    "rev": [
      "*"
    ],
    "reversed": [
      "*"
    ],
    "role": [
      "*"
    ],
    "rows": [
      "*"
    ],
    "rowspan": [
      "*"
    ],
    "rules": [
      "*"
    ],
    "sandbox": [
      "*"
    ],
    "scheme": [
      "*"
    ],
    "scope": [
      "*"
    ],
    "scopes": [
      "*"
    ],
    "scrollamount": [
      "*"
    ],
    "scrolldelay": [
      "*"
    ],
    "scrolling": [
      "*"
    ],
    "select": [
      "*"
    ],
    "selected": [
      "*"
    ],
    "shadowroot": [
      "*"
    ],
    "shadowrootdelegatesfocus": [
      "*"
    ],
    "shape": [
      "*"
    ],
    "size": [
      "*"
    ],
    "sizes": [
      "*"
    ],
    "slot": [
      "*"
    ],
    "span": [
      "*"
    ],
    "spellcheck": [
      "*"
    ],
    "src": [
      "*"
    ],
    "srcdoc": [
      "*"
    ],
    "srclang": [
      "*"
    ],
    "srcset": [
      "*"
    ],
    "standby": [
      "*"
    ],
    "start": [
      "*"
    ],
    "step": [
      "*"
    ],
    "style": [
      "*"
    ],
    "summary": [
      "*"
    ],
    "tabindex": [
      "*"
    ],
    "target": [
      "*"
    ],
    "text": [
      "*"
    ],
    "title": [
      "*"
    ],
    "topmargin": [
      "*"
    ],
    "translate": [
      "*"
    ],
    "truespeed": [
      "*"
    ],
    "trusttoken": [
      "*"
    ],
    "type": [
      "*"
    ],
    "usemap": [
      "*"
    ],
    "valign": [
      "*"
    ],
    "value": [
      "*"
    ],
    "valuetype": [
      "*"
    ],
    "version": [
      "*"
    ],
    "virtualkeyboardpolicy": [
      "*"
    ],
    "vlink": [
      "*"
    ],
    "vspace": [
      "*"
    ],
    "webkitdirectory": [
      "*"
    ],
    "width": [
      "*"
    ],
    "wrap": [
      "*"
    ]
  }
}

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[DEFAULTS]
Sanitizer API Defaults. URL: https://github.com/WICG/sanitizer-api/blob/main/resources/defaults-derivation.html
[DOMPURIFY]
DOMPurify. URL: https://github.com/cure53/DOMPurify
[MXSS]
mXSS Attacks: Attacking well-secured Web-Applications by using innerHTML Mutations. URL: https://cure53.de/fp170.pdf
[MXSS1]
Mutation XSS via namespace confusion. URL: https://research.securitum.com/mutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass/
[MXSS2]
CVE-2020-6802 Write-up. URL: https://www.checkmarx.com/blog/technical-blog/vulnerabilities-discovered-in-mozilla-bleach/
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/

IDL Index

[
  Exposed=(Window),
  SecureContext
] interface Sanitizer {
  constructor(optional SanitizerConfig config = {});

  DocumentFragment sanitize((Document or DocumentFragment) input);
  Element? sanitizeFor(DOMString element, DOMString input);

  SanitizerConfig getConfiguration();
  static SanitizerConfig getDefaultConfiguration();
};

dictionary SetHTMLOptions {
  Sanitizer sanitizer;
};
[SecureContext]
partial interface Element {
  undefined setHTML(DOMString input, optional SetHTMLOptions options = {});
};

dictionary SanitizerConfig {
  sequence<DOMString> allowElements;
  sequence<DOMString> blockElements;
  sequence<DOMString> dropElements;
  AttributeMatchList allowAttributes;
  AttributeMatchList dropAttributes;
  boolean allowCustomElements;
  boolean allowUnknownMarkup;
  boolean allowComments;
};

typedef record<DOMString, sequence<DOMString>> AttributeMatchList;

Issues Index

Is this how we specify a method on existing class "owned" by a different spe?
This should explicitly state the config’s properties in which element names are found and modify the config wih map operations. [Issue #148]
The sanitize algorithm does not need to run "create a document fragment". [Issue #149]
Does the .sanitizeFor element name require namespace-related processing? [Issue #140]
IDL is taking care of most steps in "query the sanitizer config". Clean up. [Issue #150]
The step above needs to explicitly iterate over the children and insert into parent. It could collect them in a variable or do things in place, but this is a bit too imprecise. [Issue #156]
The sanitize action for an attribute algorithm parameters do not match. Issue(153): consider creating an effective sanitizer config. Also, IDL guarantees that a config is ALWAYS given. The question is really whether the members exists. [Issue #151]
Export and refer funky element properties more precisely. [Issue #154]
Whitespaces or colons? [Issue #146]
We do not want to use the interface (e.g., "applet" and "blink" are HTMLUnknownElement) [Issue #147]
Again, this needs to be more specific. Historical, obsolete, conforming, non-conforming (e.g. bgcolor). It is desirable we make a sanitizer-specific list. [Issue #147]
The sanitizer baseline and defaults need to be carefully vetted, and are still under discussion. The values below are for illustrative purposes only.