HTML Sanitizer API

Draft Community Group Report,

This version:
https://wicg.github.io/sanitizer-api/
Issue Tracking:
GitHub
Inline In Spec
Editors:
Frederik Braun (Mozilla)
Mario Heiderich (Cure53)
Daniel Vogelheim (Google LLC)

Abstract

This document specifies a set of APIs which allow developers to take untrusted strings of HTML, and sanitize them for safe insertion into a document’s DOM.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is not normative.

Web applications often need to work with strings of HTML on the client side, perhaps as part of a client-side templating solution, perhaps as part of rendering user generated content, etc. It is difficult to do so in a safe way, however; the naive approach of joining strings together and stuffing them into an Element's innerHTML is fraught with risk, as that can and will cause JavaScript execution in a number of unexpected ways.

Libraries like [DOMPURIFY] attempt to manage this problem by carefully parsing and sanitizing strings before insertion by constructing a DOM and walking its members through an allow-list. This has proven to be a fragile approach, as the parsing APIs exposed to the web don’t always map in reasonable ways to the browser’s behavior when actually rendering a string as HTML in the "real" DOM. Moreover, the libraries need to keep on top of browsers' changing behavior over time; things that once were safe may turn into time-bombs based on new platform-level features.

The browser has a fairly good idea of when it is going to execute code. We can improve upon the user-space libraries by teaching the browser how to render HTML from an arbitrary string in a safe manner, and do so in a way that is much more likely to be maintained and updated along with the browser’s own changing parser implementation. This document outlines an API which aims to do just that.

1.1. Goals

1.2. Examples

let userControlledInput = "<img src=x onerror=alert(1)//>";

// Create a DocumentFragment from unsanitized input:
let s = new Sanitizer();
let sanitizedFragment = s.sanitize(userControlledInput);

// Replace an element’s content from unsanitized input:
element.replaceChildren(s.sanitize(userControlledInput));

2. Framework

2.1. Sanitizer API

The core API is the Sanitizer object and the sanitize method. Sanitizers can be instantiated using an optional SanitizerConfig dictionary for options. The most common use-case - preventing XSS - is handled by default, so that creating a Sanitizer with a custom config is necessary only to handle additional, application-specific use cases.

[
  Exposed=(Window),
  SecureContext
] interface Sanitizer {
  constructor(optional SanitizerConfig config = {});

  DocumentFragment sanitize(SanitizerInput input);
  DOMString sanitizeToString(SanitizerInput input);

  SanitizerConfig config();
  static SanitizerConfig defaultConfig();
};

Example:

  // The core API of the Sanitizer is the .sanitize method:
  const sanitizer = new Sanitizer();
  let untrusted_input = "Hello!";

  // Returns a DocumentFragment with one text node, "Hello!".
  sanitizer.sanitize(untrusted_input);

  // Probably we want to put this somewhere in our DOM:
  element.replaceChildren(sanitizer.sanitize(untrusted_input));

  // If our input contains markup it’ll be mostly preserved, except for
  // script-y markup:
  untrusted_input = "<em onclick='alert(1);'>Hello!</em>";
  sanitizer.sanitizer(untrusted_input);  // <em>Hello!</em>
  element.replaceChildren(sanitizer.sanitize(untrusted_input));  // No alert!

  // The .sanitize method is the primary API, and returns a DocumentFragment.
  // The .sanitizeToString method returns a DocumentFragment serialized as a
  // string.
  (sanitizer.sanitize("hello")) instanceof DocumentFragment;  // true
  typeof sanitizer.sanitize("hello");  // "object"
  typeof sanitizer.sanitizeToString("hello");  // "string"

  // In case our code expects the input in string form, we can use
  // .sanitizeToString. But ,sanitize will commonly be the better choice.
  let scriptless_input_string = sanitizer.sanitizeToString(untrusted_input);

Note: Sanitizing a string will use the HTML Parser to parse the input, which will perform some degree of normalization. So even if no sanitization steps are taken on a particular input, it cannot be guaranteed that the output of .sanitizeToString will be character-for-character identical to the input. Examples would be character regularization ("&szlig;" to "ß"), or light processing for some elements ("<image>" to "<img>");

2.2. Input Types

The sanitization methods support three input types: DOMString, Document, and DocumentFragment. In all cases, the sanitization will work on a DocumentFragment internally, but the work-fragment will be created by parsing, cloning, or using the fragment as-is, respectively.

typedef (DOMString or DocumentFragment or Document) SanitizerInput;

2.3. The Configuration Dictionary

The Sanitizer’s configuration object is a dictionary which describes modifications to the sanitize operation. If a Sanitizer has not received an explicit configuration, for example when being constructed without any parameters, then the default configuration value is used as the configuration object.

dictionary SanitizerConfig {
  sequence<DOMString> allowElements;
  sequence<DOMString> blockElements;
  sequence<DOMString> dropElements;
  AttributeMatchList allowAttributes;
  AttributeMatchList dropAttributes;
  boolean allowCustomElements;
};
allowElements

The element allow list is a sequence of strings with elements that the sanitizer should retain in the input.

blockElements

The element block list is a sequence of strings with elements where the sanitizer should remove the elements from the input, but retain their children.

dropElements

The element drop list is a sequence of strings with elements that the sanitizer should remove from the input, including its children.

allowAttributes

The attribute allow list is an attribute match list, which determines whether an attribute (on a given element) should be allowed.

dropAttributes

The attribute drop list is an attribute match list, which determines whether an attribute (on a given element) should be dropped.

allowCustomElements

allow custom elements option determines whether custom elements are to be considered. The default is to drop them. If this option is true, custom elements will still be checked against all other built-in or configured configured checks.

Note: allowElements creates a sanitizer that defaults to dropping elements, while blockElements and dropElements defaults to keeping unknown elements. Using both types is possible, but is probably of little practical use. The same applies to allowAttributes and dropAttributes.

Examples:

  const sample = "Some text <b><i>with</i></b> <blink>tags</blink>.";

  // Some text <b>with</b> text tags.
  new Sanitizer({allowElements: [ "b" ]).sanitize(sample);

  // Some text <i>with</i> <blink>tags</blink>.
  new Sanitizer({blockElements: [ "b" ]).sanitize(sample);

  // Some text <blink>tags</blink>.
  new Sanitizer({dropElements: [ "b" ]).sanitize(sample);

  // Note: The default configuration handles XSS-relevant input:

  // Non-scripting input will be passed through:
  new Sanitizer().sanitize(sample);  // Will output sample unmodified.

  // Scripts will be blocked: "abc alert(1) def"
  new Sanitizer().sanitize("abc <script>alert(1)</script> def");

A sanitizer’s configuration can be queried using the query the sanitizer config method.

Examples:

  // Does the default config allow script elements?
  Sanitizer.defaultConfig().allowElements.includes("script")  // false

  // We found a Sanitizer instance. Does it have an allow-list configured?
  const a_sanitizer = ...;
  !!a_sanitizer.config().allowElements // true, if an allowElements list is configured

  // If it does have an allow elements list, does it include the <div> element?
  a_sanitizer.config().allowElements.includes("div")  // true, if "div" is in allowElements.

  // Note that the config attribute might do some normaliztion. E.g., it won’t
  // contain key/value pairs that are not declare in the IDL.
  Object.keys(new Sanitizer({madeUpDictionaryKey: "Hello"}).config())  // []

  // As a Sanitizer’s config describes its operation, a new sanitizer with
  // another instance’s configuration should behave identically.
  // (For illustration purposes only. It would make more sense to just use a directly.)
  const a = /* ... a Sanitizer we found somewhere ... */;
  const b = new Sanitizer(a.config());  // b should behave the same as a.

  // defaultConfig() and new Sanitizer().config should be the same.
  // (For illustration purposes only. There are better ways of implementing
  // object equality in JavaScript.)
  JSON.stringify(Sanitizer.defaultConfig()) == JSON.stringify(new Sanitizer().config());  // true

2.3.1. Attribute Match Lists

An attribute match list is a map of attribute names to element names, where the special name "*" stands for all elements. A given attribute belonging to an element matches an attribute match list, if the attribute’s local name is a key in the match list, and element’s local name or "*" are found in the attribute’s value list.

typedef record<DOMString, sequence<DOMString>> AttributeMatchList;

Examples for attributes and attribute match lists:

  const sample = "<span id='span1' class='theclass' style='font-weight: bold'>hello</span>";

  // Allow only <span style>: <span style='font-weight: bold'>...</span>
  new Sanitizer({allowAttributes: {"style": ["span"]}}).sanitize(sample);

  // Allow style, but not on span: <span>...</span>
  new Sanitizer({allowAttributes: {"style": ["div"]}}).sanitize(sample);

  // Allow style on any elements: <span style='font-weight: bold'>...</span>
  new Sanitizer({allowAttributes: {"style": ["*"]}}).sanitize(sample);

  // Block <span id>: <span class='theclass' style='font-weight: bold'>...</span>
  new Sanitizer({blockAttributes: {"id": ["span"]}}).sanitize(sample);

  // Block id, everywhere: <span class='theclass' style='font-weight: bold'>...</span>
  new Sanitizer({blockAttributes: {"id": ["*"]}}).sanitize(sample);

2.4. Algorithms

To sanitize a given input of type SanitizerInput, run these steps:
  1. Let fragment be the result of running the create a document fragment algorithm on input.

  2. Run the sanitize a document fragment algorithm on fragment.

  3. Return fragment.

To sanitizeToString a given input of type SanitizerInput, run these steps:
  1. Let fragment be the result of the create a document fragment algorithm on input.

  2. Let sanitized be the result of running the sanitize algorithm on fragment.

  3. Let result be the result of running the HTML Fragment Serialization Algorithm with sanitized as the node argument.

  4. Return result.

To create a document fragment named fragment from an input of type SanitizerInput, run these steps:
  1. Switch based on input’s type:

    1. If input is of type DocumentFragment, then:

      1. Let node refer to input.

    2. If input is of type Document, then:

      1. Let node refer to input’s documentElement.

    3. If input is of type DOMString, then:

      1. Let node be the result of running the parseFromString algorithm with input as first parameter (string), and "text/html" as second parameter (type).

  2. Let clone be the result of running clone a node on node with the clone children flag set to true.

  3. Let fragment be the result of createDocumentFragment.

  4. Append the node clone to the parent fragment.

  5. Return fragment.

It’s unclear whether we can assume a generic context for parseFromString, or if we need to re-work the API to take the insertion context of the created fragment into account. <https://github.com/WICG/sanitizer-api/issues/42>

To sanitize a document fragment named fragment run these steps:
  1. Let m be a map that maps nodes to a sanitize action

  2. Let nodes be a list containing the inclusive descendants of fragment, in tree order.

  3. For each node in nodes:

    1. Let action be the result of running the sanitize a node algorithm on node.

    2. Insert node and action into m

  4. For each node in nodes:

    1. If m[node] is drop, remove the node and all children from fragment.

    2. If m[node] is block, replace the node with all of its element and text node children from fragment.

    3. If m[node] is keep, do nothing.

To sanitize a node named node run these steps:
  1. Let sanitizer be the current Sanitizer.

  2. If node is an element node:

    1. Let element be node’s element.

    2. For each attr in element’s attribute list:

      1. Let attr action be the resulf of running the effective attribute configuration algorithm on sanitizer, attr, and element.

      2. If attr action is different from keep, remove attr from i element.

    3. Run the steps to handle funky elements on element.

    4. Let action be the resulf of running the effective element configuration algorithm on sanitizer and element.

    5. Return action.

  3. Return 'keep'

What about comment nodes, CDATA, etc. ?

Some HTML elements require special treatment in a way that can’t be easily expressed in terms of configuration options or other algorithms. The following algorithm collects these in one place.

To handle funky elements on a given element, run these steps:
  1. If element’s element interface is HTMLTemplateElement:

    1. Run the steps of the sanitize a document fragment algorithm on element’s content attribute, and replace element’s content attribute with the result.

    2. Drop all child nodes of element.

  2. If element’s element interface has a HTMLHyperlinkElementUtils mixin, and if element’s protocol property is "javascript:":

    1. Remove the href attribute from element.

  3. if element’s element interface is HTMLFormElement, and if element’s action attribute is a [URL] with javascript: protocol:

    1. Remove the action attribute from element.

  4. if element’s element interface is HTMLInputElement or HTMLButtonElement, and if element’s formaction attribute is a [URL] with javascript: protocol

    1. Remove the formaction attribute from element.

To query the sanitizer config of a given sanitizer instance, run these steps:
  1. Let sanitizer be the current Sanitizer.

  2. Let config be sanitizer’s configuration object, or the default configuration if no configuration object was given.

  3. Let result be a newly constructed SanitizerOptions dictionary.

  4. For any non-empty member of config whose key is declared in SanitizerOptions, copy the value to result.

  5. Return result.

2.4.1. The Effective Configuration

A Sanitizer is potentially complex, so we will define a helper construct, the effective configuration. This is mostly a specification convenience and allows us to explain a Sanitizer’s operation in two steps: One, how to derive the effective configuration, and two, define the Sanitzer’s operation based on it.

An effective configuration maps a given element or a given pair of element and attribute to a sanitize action.

A sanitize action can have the values keep, drop, or block. To determine the stricter action of two sanitize actions, pick the 'larger' of the two actions assuming a transitively defined order with drop > block, and block > keep.

To determine a Sanitizer sanitizer’s effective element configuration for an element element, run these steps:
  1. Let config be sanitizer’s configuration object.

  2. Let baseline action be the result of running the steps of the determine the baseline configuration for an element algorithm for the element element.

  3. Let config action be the result of running the steps of the determine the effective configuration for an element algorithm for the element element and the config config.

  4. Return the stricter action of baseline action and config action.

Note: The definition of stricter actions ensures that the built-in baseline configuration cannot be overriden, and therefor forms a hard guarantee for all Sanitizer instances. (Likewise for attributes.)

To determine a Sanitizer sanitizer’s effective attribute configuration for an attribute attr attached to an element element, run these steps:
  1. Let config be sanitizer’s configuration object.

  2. Let baseline action be the result of running the steps of the determine the baseline configuration for an attribute algorithm on the attribute attr.

  3. Let config action be the result of running the steps of the determine the effective configuration for an attribute algorithm on the attribute attr, with the element element and the config config.

  4. Return the stricter action of baseline action and config action.

Before describing how an effective configuration is derived, we need a helper definition:

The element kind of an element is one of regular, unknown, or custom. Let element kind be:
Similarly, the attribute kind of an attribute is one of regular or unknown. Let attribute kind be:

The spec currently treats MathML and SVG as unknown content and therefore blocked by default. This needs to be fixed. <https://github.com/WICG/sanitizer-api/issues/72>

To determine the effective configuration for an element element, given a configuration object config, run these steps:
  1. If element’s element kind is custom and if config’s allow custom elements option is unset or set to anything other than true: Return drop.

  2. Let name be element’s tag name.

  3. If name is in config’s element drop list: Return drop.

  4. If name is in config’s element block list: Return block.

  5. If config has a non-empty element allow list and name is not in config’s element allow list: Return block.

  6. If config does not have a non-empty element allow list and name is not it the default configuration's element allow list: Return block.

  7. Return keep.

To determine the effective configuration for an attribute attr, attached to an element element, and given a configuration object config, run these steps:
  1. if config’s attribute drop list contains attr’s local name as key, and the associated value contains either element’s tag name or the string "*": Return drop.

  2. If config has a non-empty attribute allow list and it does not contain attr’s local name, or attr’s associated value contains neither element’s tag name nor the string "*": Return drop.

  3. if config does not have a non-empty attribute allow list and default configuration's attribute allow list does not contain attr’s local name, or attr’s associated value contains neither element’s tag name nor the string "*": Return drop.

  4. Return keep.

2.4.2. Baseline and Defaults

The sanitizer baseline and defaults need to be carefully vetted, and are still under discussion. The values below are for illustrative purposes only.

To determine the baseline configuration for an element element, run these steps:
  1. if element’s element kind is regular and if element’s tag name is not in the baseline element allow list: Return drop.

  2. Return keep.

To determine the baseline configuration for an attribute attr, run these steps:
  1. If attr’s attribute kind is regular and if attr’s name is not in the baseline attribute allow list: Return drop

  2. Return keep.

The sanitizer has a built-in default configuration, which is stricter than the baseline and aims to eliminate any script-injection possibility, as well as legacy or unusual constructs.

The defaults and baseline are defined by three JSON constants, baseline element allow list, baseline attribute allow list, default configuration. For better readability, these have been moved to an appendix A.

3. Security Considerations

The Sanitizer API is intended to prevent DOM-Based Cross-Site Scripting by traversing a supplied HTML content and removing elements and attributes according to a configuration. The specified API must not support the construction of a Sanitizer object that leaves script-capable markup in and doing so would be a bug in the threat model.

That being said, there are security issues which the correct usage of the Sanitizer API will not be able to protect against and the scenarios will be laid out in the following sections.

3.1. Server-Side Reflected and Stored XSS

This section is not normative.

The Sanitizer API operates solely in the DOM and adds a capability to traverse and filter an existing DocumentFragment. The Sanitizer does not address server-side reflected or stored XSS.

3.2. DOM clobbering

This section is not normative.

DOM clobbering describes an attack in which malicious HTML confuses an application by naming elements through id or name attributes such that properties like children of an HTML element in the DOM are overshadowed by the malicious content.

The Sanitizer API does not protect DOM clobbering attacks in its default state, but can be configured to remove id and name attributes.

3.3. XSS with Script gadgets

This section is not normative.

Script gadgets is a technique in which an attacker uses existing application code from popular JavaScript libraries to cause their own code to execute. This is often done by injecting innocent-looking code or seemingly inert DOM nodes that is only parsed and interpreted by a framework which then performs the execution of JavaScript based on that input.

The Sanitizer API can not prevent these attacks, but requires page authors to explicitly allow attributes and elements that are unknown to HTML and markup that is known to be widely used for templating and framework-specific code, like data- and slot attributes and elements like <slot> and <template>. We believe that these restrictions are not exhaustive and encourage page authors to examine their third party libraries for this behavior.

3.4. Mutated XSS

This section is not normative.

Mutated XSS or mXSS describes an attack based on parser mismatches when parsing an HTML snippet without the correct context. In particular, when a parsed HTML fragment has been serialized to a string, the format is not guaranteed to be parsed and interpreted exactly the same when inserted into a different parent element. An example for carrying out such an attack is by relying on the change of parsing behavior for foreign content or misnested tags.

The Sanitizer API does not protect against mutated XSS, however we encourage authors to use the sanitize() function of the API which returns a DocumentFragment and avoids risks that come with serialization and additional parsing. Directly operating on a fragment after sanitization also comes with a performance benefit, as the cost of additional serialization and parsing is avoided.

4. Acknowledgements

Cure53’s [DOMPURIFY] is a clear inspiration for the API this document describes, as is Internet Explorer’s window.toStaticHTML().

Appendix A: Built-in Constants

This appendix is normative, except where explicitly noted otherwise.

These constants define core behaviour of the Sanitizer algorithm.

Built-ins Justification

This subsection is super duper non-normative.

Note: The normative values of these constants are found below. The derivation of these are explained here, with an implementation in the [DEFAULTS] script. It is expected that these values will change before this specification is finalized. Also, we expect these to be updated to include additional HTML elements as they are introduced in user agents.

For the purpose of this Sanitizer API, [HTML] constructs fall into one of four classes, where the first defines the baseline, and the first, second, plus the third define the default:

  1. Elements and attributes that (directly) execute script. In other words, elements and attributes that are unconditionally script-ish.

  2. Legacy and "difficult" elements and attributes. Examples are the <plaintext> <xmp> and elements, which have special parsing rules attached to them. These are not dangerous _per se_, but they have contributed to existing vulnerability.

  3. Elements and attributes that we feel rarely make sense in user-supplied content.

  4. All the rest.

Specifically:

  1. Script-ish constructs:

    • The HTMLScriptElement, which proudly executes script as its sole purpose.

    • All event handler attributes, since these also execute script.

    • HTMLIFrameElement, which loads arbitrary HTML content and therefor also script.

    • The legacy HTMLObjectElement and HTMLEmbedElement, which load non-HTML active content. Also, <object>'s side-kick HTMLParamElement.

    • The no-longer conforming <frame>, <frameset>, and <applet> tags, which are outdated versions companions of several elements listed above.

    • The <noscript>, <noframes>, <noembed>, and <nolayer> elements. These, by themselves, are arguably not script-ish, but they are companions to elements listed above, and make no sense on their own.

    • Also, the HTMLBaseElement, as this effectively modifies interpretation of other URLs.

  2. Legacy and "difficult" elements.

    • Special parsing behaviour. This is not dangerous in its own right, but has contributed to mXSS-style attacks. This includes:

      • <plaintext> (Which parses in PLAINTEXT state.)

      • <title> and <textarea> (Which parse in RCDATA state.)

      • The non-conforming [<xmp>](https://html.spec.whatwg.org/#xmp) element.

    • Legacy elements:

      • <image> ([which is parsed as <img>](https://html.spec.whatwg.org/#parsing-main-inbody)).

      • <basefont>

  3. Constructs unlikely to be beneficial in user-supplied content:

The Baseline Element Allow List

The built-in baseline element allow list has the following value:

[
  "a",
  "abbr",
  "acronym",
  "address",
  "area",
  "article",
  "aside",
  "audio",
  "b",
  "basefont",
  "bdi",
  "bdo",
  "bgsound",
  "big",
  "blockquote",
  "body",
  "br",
  "button",
  "canvas",
  "caption",
  "center",
  "cite",
  "code",
  "col",
  "colgroup",
  "command",
  "data",
  "datalist",
  "dd",
  "del",
  "details",
  "dfn",
  "dialog",
  "dir",
  "div",
  "dl",
  "dt",
  "em",
  "fieldset",
  "figcaption",
  "figure",
  "font",
  "footer",
  "form",
  "h1",
  "h2",
  "h3",
  "h4",
  "h5",
  "h6",
  "head",
  "header",
  "hgroup",
  "hr",
  "html",
  "i",
  "image",
  "img",
  "input",
  "ins",
  "kbd",
  "keygen",
  "label",
  "layer",
  "legend",
  "li",
  "link",
  "listing",
  "main",
  "map",
  "mark",
  "marquee",
  "menu",
  "meta",
  "meter",
  "nav",
  "nobr",
  "noscript",
  "ol",
  "optgroup",
  "option",
  "output",
  "p",
  "picture",
  "plaintext",
  "popup",
  "portal",
  "pre",
  "progress",
  "q",
  "rb",
  "rp",
  "rt",
  "rtc",
  "ruby",
  "s",
  "samp",
  "section",
  "select",
  "selectmenu",
  "slot",
  "small",
  "source",
  "span",
  "strike",
  "strong",
  "style",
  "sub",
  "summary",
  "sup",
  "table",
  "tbody",
  "td",
  "template",
  "textarea",
  "tfoot",
  "th",
  "thead",
  "time",
  "title",
  "tr",
  "track",
  "tt",
  "u",
  "ul",
  "var",
  "video",
  "wbr",
  "xmp"
]

The Baseline Attribute Allow List

The baseline attribute allow list has the following value:

[
  "abbr",
  "accept",
  "accept-charset",
  "accesskey",
  "action",
  "align",
  "alink",
  "allow",
  "allowfullscreen",
  "allowpaymentrequest",
  "alt",
  "anchor",
  "archive",
  "as",
  "async",
  "autocapitalize",
  "autocomplete",
  "autocorrect",
  "autofocus",
  "autopictureinpicture",
  "autoplay",
  "axis",
  "background",
  "behavior",
  "bgcolor",
  "border",
  "bordercolor",
  "capture",
  "cellpadding",
  "cellspacing",
  "challenge",
  "char",
  "charoff",
  "charset",
  "checked",
  "cite",
  "class",
  "classid",
  "clear",
  "code",
  "codebase",
  "codetype",
  "color",
  "cols",
  "colspan",
  "compact",
  "content",
  "contenteditable",
  "controls",
  "controlslist",
  "conversiondestination",
  "coords",
  "crossorigin",
  "csp",
  "data",
  "datetime",
  "declare",
  "decoding",
  "default",
  "defer",
  "dir",
  "direction",
  "dirname",
  "disabled",
  "disablepictureinpicture",
  "disableremoteplayback",
  "disallowdocumentaccess",
  "download",
  "draggable",
  "elementtiming",
  "enctype",
  "end",
  "enterkeyhint",
  "event",
  "exportparts",
  "face",
  "for",
  "form",
  "formaction",
  "formenctype",
  "formmethod",
  "formnovalidate",
  "formtarget",
  "frame",
  "frameborder",
  "headers",
  "height",
  "hidden",
  "high",
  "href",
  "hreflang",
  "hreftranslate",
  "hspace",
  "http-equiv",
  "id",
  "imagesizes",
  "imagesrcset",
  "importance",
  "impressiondata",
  "impressionexpiry",
  "incremental",
  "inert",
  "inputmode",
  "integrity",
  "invisible",
  "is",
  "ismap",
  "keytype",
  "kind",
  "label",
  "lang",
  "language",
  "latencyhint",
  "leftmargin",
  "link",
  "list",
  "loading",
  "longdesc",
  "loop",
  "low",
  "lowsrc",
  "manifest",
  "marginheight",
  "marginwidth",
  "max",
  "maxlength",
  "mayscript",
  "media",
  "method",
  "min",
  "minlength",
  "multiple",
  "muted",
  "name",
  "nohref",
  "nomodule",
  "nonce",
  "noresize",
  "noshade",
  "novalidate",
  "nowrap",
  "object",
  "open",
  "optimum",
  "part",
  "pattern",
  "ping",
  "placeholder",
  "playsinline",
  "policy",
  "poster",
  "preload",
  "pseudo",
  "readonly",
  "referrerpolicy",
  "rel",
  "reportingorigin",
  "required",
  "resources",
  "rev",
  "reversed",
  "role",
  "rows",
  "rowspan",
  "rules",
  "sandbox",
  "scheme",
  "scope",
  "scopes",
  "scrollamount",
  "scrolldelay",
  "scrolling",
  "select",
  "selected",
  "shadowroot",
  "shadowrootdelegatesfocus",
  "shape",
  "size",
  "sizes",
  "slot",
  "span",
  "spellcheck",
  "src",
  "srcdoc",
  "srclang",
  "srcset",
  "standby",
  "start",
  "step",
  "style",
  "summary",
  "tabindex",
  "target",
  "text",
  "title",
  "topmargin",
  "translate",
  "truespeed",
  "trusttoken",
  "type",
  "usemap",
  "valign",
  "value",
  "valuetype",
  "version",
  "virtualkeyboardpolicy",
  "vlink",
  "vspace",
  "webkitdirectory",
  "width",
  "wrap"
]

The Default Configuration Object

The built-in default configuration has the following value:

{
  "allowCustomElements": false,
  "allowElements": [
    "a",
    "abbr",
    "acronym",
    "address",
    "area",
    "article",
    "aside",
    "audio",
    "b",
    "bdi",
    "bdo",
    "bgsound",
    "big",
    "blockquote",
    "body",
    "br",
    "button",
    "canvas",
    "caption",
    "center",
    "cite",
    "code",
    "col",
    "colgroup",
    "datalist",
    "dd",
    "del",
    "details",
    "dfn",
    "dialog",
    "dir",
    "div",
    "dl",
    "dt",
    "em",
    "fieldset",
    "figcaption",
    "figure",
    "font",
    "footer",
    "form",
    "h1",
    "h2",
    "h3",
    "h4",
    "h5",
    "h6",
    "head",
    "header",
    "hgroup",
    "hr",
    "html",
    "i",
    "img",
    "input",
    "ins",
    "kbd",
    "keygen",
    "label",
    "layer",
    "legend",
    "li",
    "link",
    "listing",
    "main",
    "map",
    "mark",
    "marquee",
    "menu",
    "meta",
    "meter",
    "nav",
    "nobr",
    "noscript",
    "ol",
    "optgroup",
    "option",
    "output",
    "p",
    "picture",
    "popup",
    "pre",
    "progress",
    "q",
    "rb",
    "rp",
    "rt",
    "rtc",
    "ruby",
    "s",
    "samp",
    "section",
    "select",
    "selectmenu",
    "small",
    "source",
    "span",
    "strike",
    "strong",
    "style",
    "sub",
    "summary",
    "sup",
    "table",
    "tbody",
    "td",
    "tfoot",
    "th",
    "thead",
    "time",
    "tr",
    "track",
    "tt",
    "u",
    "ul",
    "var",
    "video",
    "wbr"
  ],
  "allowAttributes": {
    "abbr": [
      "*"
    ],
    "accept": [
      "*"
    ],
    "accept-charset": [
      "*"
    ],
    "accesskey": [
      "*"
    ],
    "action": [
      "*"
    ],
    "align": [
      "*"
    ],
    "alink": [
      "*"
    ],
    "allow": [
      "*"
    ],
    "allowfullscreen": [
      "*"
    ],
    "alt": [
      "*"
    ],
    "anchor": [
      "*"
    ],
    "archive": [
      "*"
    ],
    "as": [
      "*"
    ],
    "async": [
      "*"
    ],
    "autocapitalize": [
      "*"
    ],
    "autocomplete": [
      "*"
    ],
    "autocorrect": [
      "*"
    ],
    "autofocus": [
      "*"
    ],
    "autopictureinpicture": [
      "*"
    ],
    "autoplay": [
      "*"
    ],
    "axis": [
      "*"
    ],
    "background": [
      "*"
    ],
    "behavior": [
      "*"
    ],
    "bgcolor": [
      "*"
    ],
    "border": [
      "*"
    ],
    "bordercolor": [
      "*"
    ],
    "capture": [
      "*"
    ],
    "cellpadding": [
      "*"
    ],
    "cellspacing": [
      "*"
    ],
    "challenge": [
      "*"
    ],
    "char": [
      "*"
    ],
    "charoff": [
      "*"
    ],
    "charset": [
      "*"
    ],
    "checked": [
      "*"
    ],
    "cite": [
      "*"
    ],
    "class": [
      "*"
    ],
    "classid": [
      "*"
    ],
    "clear": [
      "*"
    ],
    "code": [
      "*"
    ],
    "codebase": [
      "*"
    ],
    "codetype": [
      "*"
    ],
    "color": [
      "*"
    ],
    "cols": [
      "*"
    ],
    "colspan": [
      "*"
    ],
    "compact": [
      "*"
    ],
    "content": [
      "*"
    ],
    "contenteditable": [
      "*"
    ],
    "controls": [
      "*"
    ],
    "controlslist": [
      "*"
    ],
    "conversiondestination": [
      "*"
    ],
    "coords": [
      "*"
    ],
    "crossorigin": [
      "*"
    ],
    "csp": [
      "*"
    ],
    "data": [
      "*"
    ],
    "datetime": [
      "*"
    ],
    "declare": [
      "*"
    ],
    "decoding": [
      "*"
    ],
    "default": [
      "*"
    ],
    "defer": [
      "*"
    ],
    "dir": [
      "*"
    ],
    "direction": [
      "*"
    ],
    "dirname": [
      "*"
    ],
    "disabled": [
      "*"
    ],
    "disablepictureinpicture": [
      "*"
    ],
    "disableremoteplayback": [
      "*"
    ],
    "disallowdocumentaccess": [
      "*"
    ],
    "download": [
      "*"
    ],
    "draggable": [
      "*"
    ],
    "elementtiming": [
      "*"
    ],
    "enctype": [
      "*"
    ],
    "end": [
      "*"
    ],
    "enterkeyhint": [
      "*"
    ],
    "event": [
      "*"
    ],
    "exportparts": [
      "*"
    ],
    "face": [
      "*"
    ],
    "for": [
      "*"
    ],
    "form": [
      "*"
    ],
    "formaction": [
      "*"
    ],
    "formenctype": [
      "*"
    ],
    "formmethod": [
      "*"
    ],
    "formnovalidate": [
      "*"
    ],
    "formtarget": [
      "*"
    ],
    "frame": [
      "*"
    ],
    "frameborder": [
      "*"
    ],
    "headers": [
      "*"
    ],
    "height": [
      "*"
    ],
    "hidden": [
      "*"
    ],
    "high": [
      "*"
    ],
    "href": [
      "*"
    ],
    "hreflang": [
      "*"
    ],
    "hreftranslate": [
      "*"
    ],
    "hspace": [
      "*"
    ],
    "http-equiv": [
      "*"
    ],
    "id": [
      "*"
    ],
    "imagesizes": [
      "*"
    ],
    "imagesrcset": [
      "*"
    ],
    "importance": [
      "*"
    ],
    "impressiondata": [
      "*"
    ],
    "impressionexpiry": [
      "*"
    ],
    "incremental": [
      "*"
    ],
    "inert": [
      "*"
    ],
    "inputmode": [
      "*"
    ],
    "integrity": [
      "*"
    ],
    "invisible": [
      "*"
    ],
    "is": [
      "*"
    ],
    "ismap": [
      "*"
    ],
    "keytype": [
      "*"
    ],
    "kind": [
      "*"
    ],
    "label": [
      "*"
    ],
    "lang": [
      "*"
    ],
    "language": [
      "*"
    ],
    "latencyhint": [
      "*"
    ],
    "leftmargin": [
      "*"
    ],
    "link": [
      "*"
    ],
    "list": [
      "*"
    ],
    "loading": [
      "*"
    ],
    "longdesc": [
      "*"
    ],
    "loop": [
      "*"
    ],
    "low": [
      "*"
    ],
    "lowsrc": [
      "*"
    ],
    "manifest": [
      "*"
    ],
    "marginheight": [
      "*"
    ],
    "marginwidth": [
      "*"
    ],
    "max": [
      "*"
    ],
    "maxlength": [
      "*"
    ],
    "mayscript": [
      "*"
    ],
    "media": [
      "*"
    ],
    "method": [
      "*"
    ],
    "min": [
      "*"
    ],
    "minlength": [
      "*"
    ],
    "multiple": [
      "*"
    ],
    "muted": [
      "*"
    ],
    "name": [
      "*"
    ],
    "nohref": [
      "*"
    ],
    "nomodule": [
      "*"
    ],
    "nonce": [
      "*"
    ],
    "noresize": [
      "*"
    ],
    "noshade": [
      "*"
    ],
    "novalidate": [
      "*"
    ],
    "nowrap": [
      "*"
    ],
    "object": [
      "*"
    ],
    "open": [
      "*"
    ],
    "optimum": [
      "*"
    ],
    "part": [
      "*"
    ],
    "pattern": [
      "*"
    ],
    "ping": [
      "*"
    ],
    "placeholder": [
      "*"
    ],
    "playsinline": [
      "*"
    ],
    "policy": [
      "*"
    ],
    "poster": [
      "*"
    ],
    "preload": [
      "*"
    ],
    "pseudo": [
      "*"
    ],
    "readonly": [
      "*"
    ],
    "referrerpolicy": [
      "*"
    ],
    "rel": [
      "*"
    ],
    "reportingorigin": [
      "*"
    ],
    "required": [
      "*"
    ],
    "resources": [
      "*"
    ],
    "rev": [
      "*"
    ],
    "reversed": [
      "*"
    ],
    "role": [
      "*"
    ],
    "rows": [
      "*"
    ],
    "rowspan": [
      "*"
    ],
    "rules": [
      "*"
    ],
    "sandbox": [
      "*"
    ],
    "scheme": [
      "*"
    ],
    "scope": [
      "*"
    ],
    "scopes": [
      "*"
    ],
    "scrollamount": [
      "*"
    ],
    "scrolldelay": [
      "*"
    ],
    "scrolling": [
      "*"
    ],
    "select": [
      "*"
    ],
    "selected": [
      "*"
    ],
    "shadowroot": [
      "*"
    ],
    "shadowrootdelegatesfocus": [
      "*"
    ],
    "shape": [
      "*"
    ],
    "size": [
      "*"
    ],
    "sizes": [
      "*"
    ],
    "slot": [
      "*"
    ],
    "span": [
      "*"
    ],
    "spellcheck": [
      "*"
    ],
    "src": [
      "*"
    ],
    "srcdoc": [
      "*"
    ],
    "srclang": [
      "*"
    ],
    "srcset": [
      "*"
    ],
    "standby": [
      "*"
    ],
    "start": [
      "*"
    ],
    "step": [
      "*"
    ],
    "style": [
      "*"
    ],
    "summary": [
      "*"
    ],
    "tabindex": [
      "*"
    ],
    "target": [
      "*"
    ],
    "text": [
      "*"
    ],
    "title": [
      "*"
    ],
    "topmargin": [
      "*"
    ],
    "translate": [
      "*"
    ],
    "truespeed": [
      "*"
    ],
    "trusttoken": [
      "*"
    ],
    "type": [
      "*"
    ],
    "usemap": [
      "*"
    ],
    "valign": [
      "*"
    ],
    "value": [
      "*"
    ],
    "valuetype": [
      "*"
    ],
    "version": [
      "*"
    ],
    "virtualkeyboardpolicy": [
      "*"
    ],
    "vlink": [
      "*"
    ],
    "vspace": [
      "*"
    ],
    "webkitdirectory": [
      "*"
    ],
    "width": [
      "*"
    ],
    "wrap": [
      "*"
    ]
  }
}

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[DOM-Parsing]
Travis Leithead. DOM Parsing and Serialization. 17 May 2016. WD. URL: https://www.w3.org/TR/DOM-Parsing/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[WebIDL]
Boris Zbarsky. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/

Informative References

[DEFAULTS]
Sanitizer API Defaults. URL: https://github.com/WICG/sanitizer-api/blob/main/resources/defaults-derivation.html
[DOMPURIFY]
DOMPurify. URL: https://github.com/cure53/DOMPurify
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/

IDL Index

[
  Exposed=(Window),
  SecureContext
] interface Sanitizer {
  constructor(optional SanitizerConfig config = {});

  DocumentFragment sanitize(SanitizerInput input);
  DOMString sanitizeToString(SanitizerInput input);

  SanitizerConfig config();
  static SanitizerConfig defaultConfig();
};

typedef (DOMString or DocumentFragment or Document) SanitizerInput;

dictionary SanitizerConfig {
  sequence<DOMString> allowElements;
  sequence<DOMString> blockElements;
  sequence<DOMString> dropElements;
  AttributeMatchList allowAttributes;
  AttributeMatchList dropAttributes;
  boolean allowCustomElements;
};

typedef record<DOMString, sequence<DOMString>> AttributeMatchList;

Issues Index

It’s unclear whether we can assume a generic context for parseFromString, or if we need to re-work the API to take the insertion context of the created fragment into account. <https://github.com/WICG/sanitizer-api/issues/42>
What about comment nodes, CDATA, etc. ?
The spec currently treats MathML and SVG as unknown content and therefore blocked by default. This needs to be fixed. <https://github.com/WICG/sanitizer-api/issues/72>
The sanitizer baseline and defaults need to be carefully vetted, and are still under discussion. The values below are for illustrative purposes only.