1. Introduction
This section is not normative.
Web applications often need to work with strings of HTML on the client side,
perhaps as part of a client-side templating solution, perhaps as part of
rendering user generated content, etc. It is difficult to do so in a safe way.
The naive approach of joining strings together and stuffing them into
an Element
's innerHTML
is fraught with risk, as it can cause
JavaScript execution in a number of unexpected ways.
Libraries like [DOMPURIFY] attempt to manage this problem by carefully parsing and sanitizing strings before insertion, by constructing a DOM and filtering its members through an allow-list. This has proven to be a fragile approach, as the parsing APIs exposed to the web don’t always map in reasonable ways to the browser’s behavior when actually rendering a string as HTML in the "real" DOM. Moreover, the libraries need to keep on top of browsers' changing behavior over time; things that once were safe may turn into time-bombs based on new platform-level features.
The browser has a fairly good idea of when it is going to execute code. We can improve upon the user-space libraries by teaching the browser how to render HTML from an arbitrary string in a safe manner, and do so in a way that is much more likely to be maintained and updated along with the browser’s own changing parser implementation. This document outlines an API which aims to do just that.
1.1. Goals
-
Mitigate the risk of DOM-based cross-site scripting attacks by providing developers with mechanisms for handling user-controlled HTML which prevent direct script execution upon injection.
-
Make HTML output safe for use within the current user agent, taking into account its current understanding of HTML.
-
Allow developers to override the default set of elements and attributes. Adding certain elements and attributes can prevent script gadget attacks.
1.2. API Summary
The Sanitizer API offers functionality to parse a string containing HTML into a DOM tree, and to filter the resulting tree according to a user-supplied configuration. The methods come in two by two flavours:
-
Safe and unsafe: The "safe" methods will not generate any markup that executes script. That is, they should be safe from XSS. The "unsafe" methods will parse and filter whatever they’re supposed to. See also: § 4 Security Considerations.
-
Context: Methods are defined on
Element
andShadowRoot
and will replace theseNode
's children, and are largely analogous toinnerHTML
. There are also static methods on theDocument
, which parse an entire document are largely analogous toDOMParser
.parseFromString()
.
2. Framework
2.1. Sanitizer API
The Element
interface defines two methods, setHTML()
and setHTMLUnsafe()
. Both of these take a DOMString
with HTML
markup, and an optional configuration.
partial interface Element { [CEReactions ]undefined ((
setHTMLUnsafe TrustedHTML or DOMString ),
html optional SetHTMLOptions = {}); [
options CEReactions ]undefined (
setHTML DOMString ,
html optional SetHTMLOptions = {}); };
options
Element
's setHTMLUnsafe(html, options) method steps are:
-
Let compliantHTML be the result of invoking the Get Trusted Type compliant string algorithm with
TrustedHTML
, this's relevant global object, html, "Element setHTMLUnsafe", and "script". -
Let target be this's template contents if this is a
template
element; otherwise this. -
Set and filter HTML given target, this, compliantHTML, options, and false.
Element
's setHTML(html, options) method steps are:
-
Let target be this's template contents if this is a
template
; otherwise this. -
Set and filter HTML given target, this, html, options, and true.
partial interface ShadowRoot { [CEReactions ]undefined ((
setHTMLUnsafe TrustedHTML or DOMString ),
html optional SetHTMLOptions = {}); [
options CEReactions ]undefined (
setHTML DOMString ,
html optional SetHTMLOptions = {}); };
options
These methods are mirrored on the ShadowRoot
:
ShadowRoot
's setHTMLUnsafe(html, options) method steps are:
-
Let compliantHTML be the result of invoking the Get Trusted Type compliant string algorithm with
TrustedHTML
, this's relevant global object, html, "ShadowRoot setHTMLUnsafe", and "script". -
Set and filter HTML using this, this's shadow host (as context element), compliantHTML, options, and false.
ShadowRoot
's setHTML(html, options) method steps are:
-
Set and filter HTML using this (as target), this (as context element), html, options, and true.
The Document
interface gains two new methods which parse an entire Document
:
partial interface Document {static Document ((
parseHTMLUnsafe TrustedHTML or DOMString ),
html optional SetHTMLOptions = {});
options static Document (
parseHTML DOMString ,
html optional SetHTMLOptions = {}); };
options
-
Let compliantHTML be the result of invoking the Get Trusted Type compliant string algorithm with
TrustedHTML
, this's relevant global object, html, "Document parseHTMLUnsafe", and "script". -
Let document be a new
Document
, whose content type is "text/html".Note: Since document does not have a browsing context, scripting is disabled.
-
Set document’s allow declarative shadow roots to true.
-
Parse HTML from a string given document and compliantHTML.
-
Let sanitizer be the result of calling get a sanitizer instance from options with options.
-
Call sanitize on document’s root node with sanitizer and false.
-
Return document.
-
Let document be a new
Document
, whose content type is "text/html".Note: Since document does not have a browsing context, scripting is disabled.
-
Set document’s allow declarative shadow roots to true.
-
Parse HTML from a string given document and html.
-
Let sanitizer be the result of calling get a sanitizer instance from options with options.
-
Call sanitize on document’s root node with sanitizer and true.
-
Return document.
2.2. SetHTML options and the configuration object.
The family of setHTML()
-like methods all accept an options
dictionary. Right now, only one member of this dictionary is defined:
dictionary { (
SetHTMLOptions Sanitizer or SanitizerConfig )= {}; };
sanitizer
The Sanitizer
configuration object encapsulates a filter configuration.
The same configuration can be used with both "safe"
or "unsafe" methods, where the "safe" methods perform an implicit removeUnsafe
operation on the passed in configuration and have a default
configuration when none is passed. The intent is
that one (or a few) configurations will be built-up early on in a page’s
lifetime, and can then be used whenever needed. This allows implementations
to pre-process configurations.
The configuration object can be queried to return a configuration dictionary. It can also be modified directly.
[Exposed =(Window ,Worker )]interface {
Sanitizer (
constructor optional SanitizerConfig = {}); // Query configuration:
configuration SanitizerConfig (); // Modify a Sanitizer’s lists and fields:
get undefined (
allowElement SanitizerElementWithAttributes );
element undefined (
removeElement SanitizerElement );
element undefined (
replaceElementWithChildren SanitizerElement );
element undefined (
allowAttribute SanitizerAttribute );
attribute undefined (
removeAttribute SanitizerAttribute );
attribute undefined (
setComments boolean );
allow undefined (
setDataAttributes boolean ); // Remove markup that executes script. May modify multiple lists:
allow undefined (); };
removeUnsafe
Note: Sanitizer
will likely get an additional method: [NewObject] static Sanitizer getDefault();
A Sanitizer
has an associated configuration, a SanitizerConfig
.
2.3. The Configuration Dictionary
dictionary {
SanitizerElementNamespace required DOMString ;
name DOMString ?= "http://www.w3.org/1999/xhtml"; }; // Used by "elements"
_namespace dictionary :
SanitizerElementNamespaceWithAttributes SanitizerElementNamespace {sequence <SanitizerAttribute >;
attributes sequence <SanitizerAttribute >; };
removeAttributes typedef (DOMString or SanitizerElementNamespace );
SanitizerElement typedef (DOMString or SanitizerElementNamespaceWithAttributes );
SanitizerElementWithAttributes dictionary {
SanitizerAttributeNamespace required DOMString ;
name DOMString ?=
_namespace null ; };typedef (DOMString or SanitizerAttributeNamespace );
SanitizerAttribute dictionary {
SanitizerConfig sequence <SanitizerElementWithAttributes >;
elements sequence <SanitizerElement >;
removeElements sequence <SanitizerElement >;
replaceWithChildrenElements sequence <SanitizerAttribute >;
attributes sequence <SanitizerAttribute >;
removeAttributes boolean ;
comments boolean ; };
dataAttributes
3. Algorithms
Element
or DocumentFragment
target, an Element
contextElement, a string html, and a dictionary options, and a boolean safe:
-
If safe and contextElement’s local name is "
script
" and contextElement’s namespace is the HTML namespace or the SVG namespace, then return. -
Let sanitizer be the result of calling get a sanitizer instance from options with options.
-
Let newChildren be the result of the HTML fragment parsing algorithm steps given contextElement, html, and true.
-
Let fragment be a new
DocumentFragment
whose node document is contextElement’s node document. -
Run sanitize on fragment using sanitizer and safe.
-
Replace all with fragment within target.
-
Assert: options is a dictionary.
-
If options["
sanitizer
"] doesn’t exist, then:-
Let result be a new
Sanitizer
instance. -
Let setConfigurationResult be the result of set a configuration with an empty dictionary on result.
-
Assert: The setConfigurationResult is true.
-
Return result.
-
-
Assert: options["
sanitizer
"] is either aSanitizer
instance or a dictionary. -
If options["
sanitizer
"] is aSanitizer
instance: Then return options["sanitizer
"]. -
Assert: options["
sanitizer
"] is a dictionary. -
Let result be a new
Sanitizer
instance. -
Call set a configuration with options["
sanitizer
"]. -
If set a configuration returned false, throw a
TypeError
. -
Otherwise, return result.
3.1. Sanitization Algorithms
ParentNode
node, a Sanitizer
sanitizer, and a boolean safe, run these steps:
-
Let configuration be the value of sanitizer’s configuration.
-
If safe is true, then set configuration to the result of calling remove unsafe on configuration.
-
Call sanitize core on node, configuration, and with handleJavascriptNavigationUrls set to safe.
ParentNode
node, a SanitizerConfig
configuration, and a boolean handleJavascriptNavigationUrls, iterates over the DOM tree
beginning with node, and may recurse to handle some special cases (e.g.
template contents). It consistes of these steps:
-
Let current be node.
-
For each child in current’s children:
-
Assert: child implements
Text
,Comment
, orElement
.Note: Currently, this algorithm is only called on output of the HTML parser for which this assertion should hold. If in the future this algorithm will be used in different contexts, this assumption needs to be re-examined.
-
If child implements
Text
: -
else if child implements
Comment
: -
else:
-
Let elementName be a
SanitizerElementNamespace
with child’s local name and namespace. -
If configuration["
removeElements
"] contains elementName, or if configuration["elements
"] is not empty and does not contain elementName:-
remove child.
-
-
If configuration["
replaceWithChildrenElements
"] contains elementName:-
Call sanitize core on child with configuration and handleJavascriptNavigationUrls.
-
Call replace all with child’s children within child.
-
-
If elementName equals «[ "
name
" → "template
", "namespace
" → HTML namespace ]»-
Then call sanitize core on child’s template contents with configuration and handleJavascriptNavigationUrls.
-
-
If child is a shadow host:
-
Then call sanitize core on child’s shadow root with configuration and handleJavascriptNavigationUrls.
-
-
For each attribute in child’s attribute list:
-
Let attrName be a
SanitizerAttributeNamespace
with attribute’s local name and namespace. -
If configuration["
removeAttributes
"] contains attrName:-
Remove attribute from child.
-
-
If configuration["
elements
"]["removeAttributes
"] contains attrName:-
Remove attribute from child.
-
-
If all of the following are false, then remove attribute from child.
-
configuration["
attributes
"] exists and contains attrName -
configuration["
elements
"]["attributes
"] contains attrName -
"data-" is a code unit prefix of local name and namespace is
null
and configuration["dataAttributes
"] is true
-
-
If handleJavascriptNavigationUrls and «[elementName, attrName]» matches an entry in the navigating URL attributes list, and if attribute’s protocol is "
javascript:
":-
Then remove attribute from child.
-
-
-
-
3.2. Configuration Processing
SanitizerConfig
configuration, do:
-
Set element to the result of canonicalize a sanitizer element with attributes with element.
-
Remove element from configuration["
removeElements
"]. -
Remove element from configuration["
replaceWithChildrenElements
"].
NOTE: Handling of allowElement is a little more complicated than the other methods, because the element allow list can have per-element allow- and remove-attribute lists. We first remove the given element from the list before then adding it, which has the effect of re-setting (rather than merging or elsehow modifying) the per-element list to whatever is passed in. In other words, the per-element allow- and remove-lists can only be set as a whole.
NOTE: Remove matches on name and namespace, so adding an
element with attributes would still remove the matching element from the removeElements
and replaceWithChildrenElements
lists.
SanitizerConfig
configuration, do:
-
Set element to the result of canonicalize a sanitizer element with element.
-
Add element to configuration["
removeElements
"]. -
Remove element from configuration["
replaceWithChildrenElements
"].
SanitizerConfig
configuration, do:
-
Set element to the result of canonicalize a sanitizer element with element.
-
Add element to configuration["
replaceWithChildrenElements
"]. -
Remove element from configuration["
removeElements
"].
SanitizerConfig
configuration, do:
-
Set attribute to the result of canonicalize a sanitizer attribute with attribute.
-
Add attribute to configuration["
attributes
"]. -
Remove attribute from configuration["
removeAttributes
"].
SanitizerConfig
configuration, do:
-
Set attribute to the result of canonicalize a sanitizer attribute with attribute.
-
Add attribute to configuration["
removeAttributes
"]. -
Remove attribute from configuration["
attributes
"].
SanitizerConfig
configuration, do:
-
Set configuration["
comments
"] to allow.
SanitizerConfig
configuration, do:
-
Set configuration["
dataAttributes
"] to allow.
Note: While this algorithm is called remove unsafe, we use the term "unsafe" strictly in the sense of this spec, to denote content that will execute JavaScript when inserted into the document. In other words, this method will remove oportunities for XSS.
To remove unsafe from a configuration, do this:
-
Assert: The built-in safe baseline configuration has
removeElements
andremoveAttributes
keys set, but notelements
,replaceWithChildrenElements
, orattributes
. -
Let result be a copy of configuration.
-
For each element in built-in safe baseline configuration[
removeElements
]:-
Call remove an element with element and result.
-
-
For each attribute in built-in safe baseline configuration[
removeAttributes
]:-
Call remove an attribute with attribute and result.
-
-
Return result.
Sanitizer
sanitizer:
-
For each element of configuration["
elements
"] do:-
Call allow an element with element and sanitizer.
-
-
For each element of configuration["
removeElements
"] do:-
Call remove an element with element and sanitizer.
-
-
For each element of configuration["
replaceWithChildrenElements
"] do:-
Call replace an element with its children with element and sanitizer.
-
-
For each attribute of configuration["
attributes
"] do:-
Call allow an attribute with attribute and sanitizer.
-
-
For each attribute of configuration["
removeAttributes
"] do:-
Call remove an attribute with attribute and sanitizer.
-
-
Call set comments with configuration["
comments
"] and sanitizer. -
Call set data attributes with configuration["
dataAttributes
"] and sanitizer. -
Return whether all of the following are true:
-
size of configuration["
elements
"] equals size of this's configuration["elements
"]. -
size of configuration["
removeElements
"] equals size of this's configuration["removeElements
"]. -
size of configuration["
replaceWithChildrenElements
"] equals size of this's configuration["replaceWithChildrenElements
"]. -
size of configuration["
attributes
"] equals size of this's configuration["attributes
"]. -
size of configuration["
removeAttributes
"] equals size of this's configuration["removeAttributes
"]. -
Either configuration["
elements
"] or configuration["removeElements
"] exist, or neither, but not both. -
Either configuration["
attributes
"] or configuration["removeAttributes
"] exist, or neither, but not both.
-
Note: Previous versions of this spec had elaborate definitions of how to canonicalize a config. This has now effectively been moved into the method definitions.
Note: This operation is defined in terms of the manipulation methods on the Sanitizer
. Those methods remove matching entries from other lists.
The size equality steps in the last step would then catch this.
For example: { allow: ["div", "div"] }
would create a Sanitizer with one element in
the allow list. The final test would then return false, which would cause
the caller to throw an exception.
This is still missing error checks for the per-element attribute lists and syntax errors.
SanitizerElementWithAttributes
element, do this:
-
Let result be the result of canonicalize a sanitizer element with element.
-
If element is a dictionary:
-
For each attribute in element["
attributes
"]:-
Add the result of canonicalize a sanitizer attribute with attribute to result["
attributes
"].
-
-
For each attribute in element["
removeAttributes
"]:-
Add the result of canonicalize a sanitizer attribute with attribute to result["
removeAttributes
"].
-
-
-
Return result.
SanitizerElement
element,
return the result of canonicalize a sanitizer name with element and the HTML namespace as the default namespace. SanitizerAttribute
attribute,
return the result of canonicalize a sanitizer name with attribute and null
as the default namespace. -
Assert: name is either a
DOMString
or a dictionary. -
If name is a
DOMString
, then return «[ "name
" → name, "namespace
" → defaultNamespace]». -
Assert: name is a dictionary and name["name"] exists.
-
Return «[
"name
" → name["name"],
"namespace
" → ( name["namespace"] if it exists, otherwise defaultNamespace )
]».
3.3. Supporting Algorithms
For the canonicalized element
and attribute name
lists
used in this spec, list membership is based on matching both "name
" and "namespace
"
entries:
3.4. Defaults
There are four builtins:
-
the built-in safe baseline configuration, and
The built-in safe default configuration is the same as the built-in safe baseline configuration.
Determine if this actually holds. [Issue #233]
The built-in unsafe default configuration is meant to allow anything. It is as follows:
{ allow: [], removeElements: [], attributes: [], removeAttributes: [], }
The built-in safe baseline configuration is meant to block only script-content, and nothing else. It is as follows:
{ removeElements: [ { name: "script", namespace: "http://www.w3.org/1999/xhtml" }, { name: "script", namespace: "http://www.w3.org/2000/svg" } ], removeAttributes: [....], }
javascript:
"
navigations are "unsafe", are as follows:
«[
[
{ "name
" → "a
", "namespace
" → HTML namespace },
{ "name
" → "href
", "namespace
" → null
}
],
[
{ "name
" → "area
", "namespace
" → HTML namespace },
{ "name
" → "href
", "namespace
" → null
}
],
[
{ "name
" → "form
", "namespace
" → HTML namespace },
{ "name
" → "action
", "namespace
" → null
}
],
[
{ "name
" → "input
", "namespace
" → HTML namespace },
{ "name
" → "formaction
", "namespace
" → null
}
],
[
{ "name
" → "button
", "namespace
" → HTML namespace },
{ "name
" → "formaction
", "namespace
" → null
}
],
]»
4. Security Considerations
The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting by traversing a supplied HTML content and removing elements and attributes according to a configuration. The specified API must not support the construction of a Sanitizer object that leaves script-capable markup in and doing so would be a bug in the threat model.
That being said, there are security issues which the correct usage of the Sanitizer API will not be able to protect against and the scenarios will be laid out in the following sections.
4.1. Server-Side Reflected and Stored XSS
This section is not normative.
The Sanitizer API operates solely in the DOM and adds a capability to traverse and filter an existing DocumentFragment. The Sanitizer does not address server-side reflected or stored XSS.
4.2. DOM clobbering
This section is not normative.
DOM clobbering describes an attack in which malicious HTML confuses an
application by naming elements through id
or name
attributes such that
properties like children
of an HTML element in the DOM are overshadowed by
the malicious content.
The Sanitizer API does not protect DOM clobbering attacks in its
default state, but can be configured to remove id
and name
attributes.
4.3. XSS with Script gadgets
This section is not normative.
Script gadgets are a technique in which an attacker uses existing application code from popular JavaScript libraries to cause their own code to execute. This is often done by injecting innocent-looking code or seemingly inert DOM nodes that is only parsed and interpreted by a framework which then performs the execution of JavaScript based on that input.
The Sanitizer API can not prevent these attacks, but requires page authors to
explicitly allow unknown elements in general, and authors must additionally
explicitly configure unknown attributes and elements and markup that is known
to be widely used for templating and framework-specific code,
like data-
and slot
attributes and elements like <slot>
and <template>
.
We believe that these restrictions are not exhaustive and encourage page
authors to examine their third party libraries for this behavior.
4.4. Mutated XSS
This section is not normative.
Mutated XSS or mXSS describes an attack based on parser context mismatches when parsing an HTML snippet without the correct context. In particular, when a parsed HTML fragment has been serialized to a string, the string is not guaranteed to be parsed and interpreted exactly the same when inserted into a different parent element. An example for carrying out such an attack is by relying on the change of parsing behavior for foreign content or mis-nested tags.
The Sanitizer API offers only functions that turn a string into a node tree.
The context is supplied implicitly by all sanitizer functions: Element.setHTML()
uses the current element; Document.parseHTML()
creates a
new document. Therefore Sanitizer API is not directly affected by mutated XSS.
If a developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML
, and to then parse it again then mutated XSS may occur.
We discourage this practice. If processing or passing of HTML as a
string should be necessary after all, then any string should be considered
untrusted and should be sanitized (again) when inserting it into the DOM. In
other words, a sanitized and then serialized HTML tree can no
longer be considered as sanitized.
A more complete treatment of mXSS can be found in [MXSS].
5. Acknowledgements
Cure53’s [DOMPURIFY] is a clear inspiration for the API this document
describes, as is Internet Explorer’s window.toStaticHTML()
.