1. Introduction
This section is not normative.
Web applications often need to work with strings of HTML on the client side,
perhaps as part of a client-side templating solution, perhaps as part of
rendering user generated content, etc. It is difficult to do so in a safe way.
The naive approach of joining strings together and stuffing them into
an Element
's innerHTML
is fraught with risk, as it can cause
JavaScript execution in a number of unexpected ways.
Libraries like [DOMPURIFY] attempt to manage this problem by carefully parsing and sanitizing strings before insertion, by constructing a DOM and filtering its members through an allow-list. This has proven to be a fragile approach, as the parsing APIs exposed to the web don’t always map in reasonable ways to the browser’s behavior when actually rendering a string as HTML in the "real" DOM. Moreover, the libraries need to keep on top of browsers' changing behavior over time; things that once were safe may turn into time-bombs based on new platform-level features.
The browser has a fairly good idea of when it is going to execute code. We can improve upon the user-space libraries by teaching the browser how to render HTML from an arbitrary string in a safe manner, and do so in a way that is much more likely to be maintained and updated along with the browser’s own changing parser implementation. This document outlines an API which aims to do just that.
1.1. Goals
-
Mitigate the risk of DOM-based cross-site scripting attacks by providing developers with mechanisms for handling user-controlled HTML which prevent direct script execution upon injection.
-
Make HTML output safe for use within the current user agent, taking into account its current understanding of HTML.
-
Allow developers to override the defaults set of elements and attributes. Adding certain elements and attributes can prevent script gadget attacks.
1.2. API Summary
let s= new Sanitizer(); // Case: The input data is available as a tree of DOM nodes. let userControlledTree= ...; element. replaceChildren( s. sanitize( userControlledTree)); // Case: The input is available as a string, and we know the element to insert // it into: let userControlledInput= "<img src=x onerror=alert(1)//>" ; element. setHTML( userControlledInput, { sanitizer: s}); // Case: The input is available as a string, and we know which type of element // we will eventually insert it to, but can’t or don’t want to perform the // insertion now: let forDiv= s. sanitizeFor( "div" , userControlledInput); // Later: document. querySelector( ` ${ forDiv. localName} #target` ). replaceChildren(... forDiv. childNodes);
1.3. The Trouble With Strings
Many HTML sanitizer libraries are based on string-to-string APIs, while this API does not offer such a method. This sub-section explains the reasons and implications for the Sanitizer API.
To convert a string into a tree of nodes (or a fragment), it needs to be parsed. The HTML parsing algorithm carefully specifies how parsing HTML works. This parsing algorithm is dependent on the current node as its parsing context. That is, the same string parsed in the context of different HTML nodes will yield different parse trees.
<em>bla
in <div>
and <textarea>
context.
-
<div><em>bla</div>
⇨<div><em>bla</em></div>
-
<textarea><em>bla</textarea>
⇨<textarea><em>bla</textarea>
<table>
and non-table (<div>
) context.
-
<table><td>text</table>
⇨<table><td>text</table>
-
<div><td>text</div>
⇨<div>text</div>
These differences can allow bugs to creep into a site’s sanitization strategy, which can (and have been) exploited by a class of XSS-style attacks called mXSS. These attacks ultimately depend on confusions of the parsing context, for example when a developer will sanitize a string in one (parsing) context, while then applying the resulting string in a different context, where it will be interpreted differently.
Since this attack class depends on a particular usage of the string after the sanitization has occurred, the API itself has only limited capability to protect its users. As a result, the Sanitizer API follows the following principle:
Whenever the Sanitzer API parses or unparses a DOM (sub-)tree to or from a string, it will either do so in a fashion where the correct parse context is implied by the operation; or it will require a parse context to be supplied by the developer and will retain the given context in the resulting argument. In other words, the Sanitzer API will never assume a parsing context, or disappear a parsing context that has been supplied earlier.
1.3.1. Case 1: Sanitizing With Nodes, Only.
If the user data in question is already available as DOM nodes - for example
a Document
instance in a frame - then the Sanitizer can be easily used:
const sanitizer= new Sanitizer( ... ); // Our Sanitizer; // There is an iframe with id "userFrame" whose content we are interested in. const user_tree= document. getElementById( "userFrame" ). contentWindow. document; const sanitized= sanitizer. sanitize( user_tree);
Note: Parsing an HTML string can have various side-effects, like network
requests or executing scripts. Naively parsing these, e.g. by assigning a
string to .innerHTML
of an unconnected element, will not reliably prevent
these. Therefore, if the user data to be sanitized is originally
in string form, we recommend to go with one of the following cases.
1.3.2. Case 2: Sanitizing a String with Implied Context.
If the user data is available in string form and we wish to directly insert the sanitized subtree into the DOM, we can do so as follows:
const user_string= "..." ; // The user string. const sanitizer= new Sanitizer( ... ); // Our Sanitizer; // We want to insert the HTML in user_string into a target element with id // target. That is, we want the equivalent of target.innerHTML = value, except // without the XSS risks. document. getElementById( "target" ). setHTML( user_string, { sanitizer: sanitizer});
1.3.3. Case 3: Sanitizing a String with a Given Context.
If the user data is available in string form and the developer wishes to sanitize it now, but apply the result to the DOM later, then the Sanitizer must be informed about the context that it will be used. To prevent context confusion the result is wrapper a container that contains both the result and also the parse context. Conveniently, this container already exists, and it is the node itself!
// A certain piece of user input is meant to be used repeatedly, to insert // it in multiple elements on the page. All these elements will be <div> // elements. const user_string= "..." ; // The user string. const sanitizer= new Sanitizer( ... ); // Our Sanitizer. const sanitized= sanitizer. sanitizeFor( "div" , user_string); sanitizedinstanceof HTMLDivElement// true. The Sanitizer has given us a node. // ... later, in the same program ... for ( let elem= ... of ...) { // All of our "elem" instances should be of the same type used in the // .sanitizeFor call above. With an assertion library, this could look as // follows: assert_true( eleminstanceof sanitized. constructor ); // Assuming assert_true, like in WPT tests. elem. replaceChildren(... sanitized. childNodes); } // Instead of: elem. replaceChildren(... sanitized. childNodes); // one could write: elem. innerHTML= sanitized. innerHTML; // This should have the same effect, except be slower, since this will trigger // un-parsing and then re-parsing the node tree which we already have // available as a node tree. So we recommend to stick with the former version.
1.3.4. The Other Case
What if neither of these cases works with a given application structure, and a string-to-string operation is required? In this case, the developer is free to take the sanitization result and remove it from its context. In this case, the responsibility to prevent mXSS-class attacks that stem from mis-applying those strings in an inappropriate context remains with the developer.
const user_string= "..." ; // The user string. const sanitizer= new Sanitizer( ... ); // Our Sanitizer. // The developer plans to insert this string into a <div> element, but has to // keep this around as a string (instead of an element). It’s important that // the developer remembers the parsing context and MUST NOT use this in a // different parsing context in order to prevent mXSS attacks. const sanitized_for_div= sanitizer. sanitizeFor( "div" , user_string). innerHTML;
2. Framework
2.1. Sanitizer API
The core API is the Sanitizer
object and the sanitize method. Sanitizers can
be instantiated using an optional SanitizerConfig
dictionary for options.
The most common use-case - preventing XSS - is handled by default,
so that creating a Sanitizer with a custom config is necessary only to
handle additional, application-specific use cases.
[Exposed =(Window ),SecureContext ]interface {
Sanitizer constructor (optional SanitizerConfig = {});
config DocumentFragment sanitize ((Document or DocumentFragment ));
input Element ?sanitizeFor (DOMString ,
element DOMString );
input SanitizerConfig getConfiguration ();static SanitizerConfig getDefaultConfiguration (); };
-
The
new Sanitizer(config)
constructor steps are to run the create a sanitizer algorithm steps on this with config as parameter. -
The
sanitize(input)
method steps are to return the result of running the sanitize algorithm on input, -
The
sanitizeFor(element, input)
method steps are to return the result of running sanitizeFor algorithm on element and input. -
The
getConfiguration()
method steps are to return the result of running the query the sanitizer config algorithm. It essentially returns a copy of the Sanitizer’s configuration dictionary, with some degree of normalization. -
The value of the static
getDefaultConfiguration()
method steps are to return the value of the default configuration object.
The Element
interface gains an additional method, setHTML
which
applies a string using a Sanitizer
directly to an existing element node.
dictionary {
SetHTMLOptions Sanitizer ; }; [
sanitizer SecureContext ]partial interface Element {undefined setHTML (DOMString ,
input optional SetHTMLOptions = {}); };
options
-
The
setHTML(input, options)
method steps are to run the sanitizeAndSet algorithm on this, input, and options.
Tests
Is this how we specify a method on existing class "owned" by a different spe?
// To make our examples easy to follow, we’ll need a way create DOM nodes. // The following is hacky way to accomplish this, for illustration only, // that you shall pretty please not use in practice. This parsing method can // cause side-effects based on the string being parsed, which is insecure. // In fact, this very API exists for the sole purpose of preventing the // problems that this approach has. // // But... for our examples we’ll need something that is quick and easy, since // we cannot use our own Sanitizer API to explain our own Sanitizer API. const to_node= str=> document. createRange(). createContextualFragment( str); // The core API of the Sanitizer is the .sanitize method: let untrusted_input= to_node( "Hello!" ); const sanitizer= new Sanitizer(); sanitizer. sanitize( untrusted_input); // DocumentFragment w/ a text node, "Hello!" // Probably we want to put this somewhere in our DOM: element. replaceChildren( sanitizer. sanitize( untrusted_input)); // If our input contains markup it’ll be mostly preserved, except for // script-y markup: untrusted_input= to_node( "<em onclick='alert(1);'>Hello!</em>" ); sanitizer. sanitize( untrusted_input); // <em>Hello!</em> element. replaceChildren( sanitizer. sanitize( untrusted_input)); // No alert! // The .sanitize method is the primary API, and returns a DocumentFragment. // The .sanitizeFor method accepts and parses a string and returns an HTML // element node. const hello= to_node( "hello" ); ( sanitizer. sanitize( hello)) instanceof DocumentFragment; // true ( sanitizer. sanitizeFor( "template" , "hello" )) instanceof HTMLTemplateElement; // true
2.2. String Handling
Parsing (and unparsing) strings to (or from) HTML requires a context element.
Thus, the sanitizeFor
method requires us to pass in a context, which the
implementation can then hand over to the HTML Parser.
Additionally, the Element
interface gains a setHTML
method, which
always knows the correct context, because it is applied to a given Element
instance. This Element
is the correct context for both parsing and
unparsing its own content.
One way to conceptualize this is to view string sanitization as a three step
operation: 1, parsing the string; 2, sanitizing the resulting node tree;
and 3, grafting the resulting subtree onto our live DOM. Sanitizer.sanitize
is the middle step. Sanitizer.sanitizeFor
performs the first and second steps, but leaves the
third to the developer. Element.setHTML
does all three. Which to use
depends on the structure of your application, whether you can do all three
steps simultaneously, or whether maybe the sanitization is removed (in either
code structure or point in time) from the eventual modification of the DOM.
// If the markup to be sanitized is already available as a tree, for example // from an embedded frame, one can use sanitize: document. getElementById( "target" ). replaceChildren( sanitizer. sanitize( document. querySelector( "iframe#myframe" ). contentWindow. document)); // If the markup to be sanitized is present in string form, but we already // have the element we want to insert in available: const untrusted_input= "...." ; document. getElementById( "someelement" ). setHTML( untrusted_input, { sanitizer: sanitizer}); // Same as above, but using the default Sanitizer configuration: document. getElementById( "somelement" ). setHTML( untrusted_input); // If the markup to be sanitized is present in string form, but we don’t want // to do the DOM insertion now: let no_xss= sanitizer. sanitizeFor( "div" , untrusted_input); // ... much later ... document. querySelector( "div#targetdiv" ). replaceChildren(... no_xss. childNodes); // Note that parsing HTML depends on the current context in many ways, some // subtle, some not so much. Supplying a different context than what the // result will eventually be used in has both security and functional risks. // It’s up to the developer to handle this safely. // // Example: Most, many parsing contexts disallow table data (<td>) without // an enclosing table. sanitizer. sanitizeFor( "div" , "<td>data</td>" ). innerHTML// "data" sanitizer. sanitizeFor( "table" , "<td>data</td>" ). innerHTML// "<td>data</td>"
.sanitizeFor
will be character-for-character
identical to the input.
Sanitizer.sanitizeFor
and Element.setHTML
can replace the
respective other. Both are provided since they support different use cases.
// sanitizeFor, based on SetInnerHTML. function sanitizeFor( element, input) { const elem= document. createElement( element); elem. setHTML( input, { sanitizer: this }); return elem; } // setHTML, based on sanitizeFor. function setHTML( input, options) { const sanitizer= options? . sanitizer?? new Sanitizer(); this . replaceChildren(... sanitizer. sanitizeFor( this . localName, input). childNodes); }
2.3. The Configuration Dictionary
The Sanitizer’s configuration dictionary is a dictionary which describes modifications to the sanitize operation. If a Sanitizer has not received an explicit configuration, for example when being constructed without any parameters, then the default configuration value is used as the configuration dictionary.
dictionary {
SanitizerConfig sequence <DOMString >;
allowElements sequence <DOMString >;
blockElements sequence <DOMString >;
dropElements AttributeMatchList ;
allowAttributes AttributeMatchList ;
dropAttributes boolean ;
allowCustomElements boolean ;
allowUnknownMarkup boolean ; };
allowComments
- allowElements
-
The element allow list is a sequence of strings with elements that the sanitizer should retain in the input.
- blockElements
-
The element block list is a sequence of strings with elements where the sanitizer should remove the elements from the input, but retain their children.
- dropElements
-
The element drop list is a sequence of strings with elements that the sanitizer should remove from the input, including its children.
- allowAttributes
-
The attribute allow list is an attribute match list, which determines whether an attribute (on a given element) should be allowed.
- dropAttributes
-
The attribute drop list is an attribute match list, which determines whether an attribute (on a given element) should be dropped.
- allowCustomElements
-
The
allow custom elements
option determines whether custom elements are to be considered. The default is to drop them. If this option is true, custom elements will still be checked against all other built-in or configured checks. - allowUnknownMarkup
-
The
allow unknown markup
option determines whether unknown HTML elements are to be considered. The default is to drop them. If this option is true, unkown HTML elements will still be checked against all other built-in or configured checks. - allowComments
-
The allow comments option determines whether HTML comments are allowed.
Note: allowElements
creates a sanitizer that defaults to dropping elements,
while blockElements
and dropElements
defaults to keeping unknown
elements. Using both types is possible, but is probably of little practical
use. The same applies to allowAttributes
and dropAttributes
.
const sample= to_node( "Some text <b><i>with</i></b> <blink>tags</blink>." ); const script_sample= to_node( "abc <script>alert(1)</script> def" ); // Some text <b>with</b> text tags. new Sanitizer({ allowElements: [ "b" ]}). sanitize( sample); // Some text <i>with</i> <blink>tags</blink>. new Sanitizer({ blockElements: [ "b" ]}). sanitize( sample); // Some text <blink>tags</blink>. new Sanitizer({ dropElements: [ "b" ]}). sanitize( sample); // Note: The default configuration handles XSS-relevant input: // Non-scripting input will be passed through: new Sanitizer(). sanitize( sample); // Will output sample unmodified. // Scripts will be blocked: "abc alert(1) def" new Sanitizer(). sanitize( script_sample);
In addition to allow and block lists for elements and attributes, there are also options to configure some node or element types.
Examples:
// Comments will be dropped by default. const comment= to_node( "Hello World!" ); new Sanitizer(). sanitize( comment); // "Hello World!" new Sanitizer({ allowComments: true }). sanitize( comment); // Same as comment.
A sanitizer’s configuration can be queried using the query the sanitizer config method.
// Does the default config allow script elements? Sanitizer. getDefaultConfiguration(). allowElements. includes( "script" ) // false // We found a Sanitizer instance. Does it have an allow-list configured? const a_sanitizer= ...; !! a_sanitizer. getConfiguration(). allowElements// true, if an allowElements list is configured // If it does have an allow elements list, does it include the <div> element? a_sanitizer. getConfiguration(). allowElements. includes( "div" ) // true, if "div" is in allowElements. // Note that the getConfiguration method might do some normalization. E.g., it won’t // contain key/value pairs that are not declare in the IDL. Object. keys( new Sanitizer({ madeUpDictionaryKey: "Hello" }). getConfiguration()) // [] // As a Sanitizer’s config describes its operation, a new sanitizer with // another instance’s configuration should behave identically. // (For illustration purposes only. It would make more sense to just use a directly.) const a= /* ... a Sanitizer we found somewhere ... */ ; const b= new Sanitizer( a. getConfiguration()); // b should behave the same as a. // getDefaultConfiguration() and new Sanitizer().getConfiguration should be the same. // (For illustration purposes only. There are better ways of implementing // object equality in JavaScript.) JSON. stringify( Sanitizer. getDefaultConfiguration()) == JSON. stringify( new Sanitizer(). getConfiguration()); // true
2.3.1. Attribute Match Lists
An attribute match list is a map of attributes to elements,
where the special name "*" stands for all attributes or elements.
A given attribute belonging to an element matches an attribute match list, if the attribute is a key in the match list,
and element or "*"
are found in the attribute’s value list.
Element names are interpreted as names in the [[HTML namespace]] and non-namespaced attributes - i.e., what one may think of as normal [HTML] elements and attributes. Elements are named by their local name, and attributes, too.
typedef record <DOMString ,sequence <DOMString >>;
AttributeMatchList
const sample= to_node( "<span id='span1' class='theclass' style='font-weight: bold'>hello</span>" ); // Allow only <span style>: <span style='font-weight: bold'>...</span> new Sanitizer({ allowAttributes: { "style" : [ "span" ]}}). sanitize( sample); // Allow style, but not on span: <span>...</span> new Sanitizer({ allowAttributes: { "style" : [ "div" ]}}). sanitize( sample); // Allow style on any elements: <span style='font-weight: bold'>...</span> new Sanitizer({ allowAttributes: { "style" : [ "*" ]}}). sanitize( sample); // Drop <span id>: <span class='theclass' style='font-weight: bold'>...</span> new Sanitizer({ dropAttributes: { "id" : [ "span" ]}}). sanitize( sample); // Drop id, everywhere: <span class='theclass' style='font-weight: bold'>...</span> new Sanitizer({ dropAttributes: { "id" : [ "*" ]}}). sanitize( sample);
3. Algorithms
3.1. API Implementation
-
Create a copy of config.
-
Set config as this's configuration dictionary.
This should explicitly state the config’s properties in which element names are found and modify the config wih map operations. [Issue #148]
Note: The configuration object contains element names in the element allow list, element block list, and element drop list, and in the mapped values in the attribute allow list and attribute drop list.
Document or DocumentFragment
run these steps:
-
Let fragment be the result of running the create a document fragment algorithm on input.
-
Run the sanitize a document fragment algorithm on fragment.
-
Return fragment.
The sanitize algorithm does not need to run "create a document fragment". [Issue #149]
-
Let element be an HTML element created by running the steps of the creating an element algorithm with the current document, element name, the HTML namespace, and no optional parameters.
-
If the element kind of element is
regular
and if the baseline element allow list does not contain element name, then returnnull
. -
Let fragment be the result of invoking the html fragment parsing algorithm, with element as the
context element
and input asmarkup
. -
Run the steps of the sanitize a document fragment algorithm on fragment.
-
Replace all with fragment as the
node
and element as theparent
. -
Return element.
Does the .sanitizeFor
element name require namespace-related processing? [Issue #140]
SetHTMLOptions
options dictionary on an Element
node this,
run these steps:
-
If the element kind of this is
regular
and this' local name does not match any name in the baseline element allow list, then throw aTypeError
and return. -
If the
sanitizer
member exists in the optionsSetHTMLOptions
dictionary,-
then let sanitizer be the value of the
sanitizer
member of the optionsSetHTMLOptions
dictionary, -
otherwise let sanitizer be the result of the create a Sanitizer algorithm without a
config
parameter.
-
-
Let fragment be the result of invoking the html fragment parsing algorithm with this as the
context node
and value asmarkup
. -
Run the steps if the sanitize a document fragment algorithm on fragment, using sanitizer as the current
Sanitizer
instance. -
Replace all with fragment as the
node
and this as theparent
.
-
Let sanitizer be the current Sanitizer.
-
Let config be sanitizer’s configuration dictionary, or the default configuration if no configuration dictionary was given.
-
Let result be a newly constructed
SanitizerConfig
dictionary. -
For any non-empty member of config whose key is declared in
SanitizerConfig
, copy the value to result. -
Return result.
IDL is taking care of most steps in "query the sanitizer config". Clean up. [Issue #150]
3.2. Helper Definitions
Document or DocumentFragment
, run these steps:
-
Let node be null.
-
Switch based on input’s type:
-
If input is of type
DocumentFragment
, then:-
Set node to input.
-
-
If input is of type
Document
, then:-
Set node to input’s
documentElement
.
-
-
-
Let clone be the result of running clone a node on node with the clone children flag set.
-
Let fragment be a new
DocumentFragment
whose node document is node’s node document. -
Append the node clone to fragment.
-
Return fragment.
3.3. Sanitization Algorithms
Sanitizer
sanitizer run these steps:
-
Let m be a map that maps nodes to a sanitize action.
-
Let nodes be a list containing the inclusive descendants of fragment, in tree order.
-
For each node in nodes:
-
Let action be the result of running the sanitize a node algorithm on node with sanitizer.
-
Set m[node] to action.
-
-
For each node in nodes:
The step above needs to explicitly iterate over the children and insert into parent. It could collect them in a variable or do things in place, but this is a bit too imprecise. [Issue #156]
-
Assert: node is not a
Document
orDocumentFragment
orAttr
orDocumentType
node. -
If node is an element node:
-
Let element be node.
-
For each attr in element’s attribute list:
-
Let attr action be the result of running the sanitize action for an attribute algorithm on attr and element.
-
If attr action is different from
keep
, remove an attribute supplying attr.
-
-
Run the steps to handle funky elements on element.
-
Let action be the result of running the sanitize action for an element on element.
-
Return action.
-
-
-
Let config be sanitizer’s configuration dictionary, or the default configuration if no configuration dictionary was given.
-
If config’s allow comments option exists and
|config|[allowComments]
istrue
: Returnkeep
. -
Return
drop
.
-
-
Assert: node is a
ProcessingInstruction
-
Return
drop
.
The sanitize action for an attribute algorithm parameters do not match. Issue(153): consider creating an effective sanitizer config. Also, IDL guarantees that a config is ALWAYS given. The question is really whether the members exists. [Issue #151]
Some HTML elements require special treatment in a way that can’t be easily expressed in terms of configuration options or other algorithms. The following algorithm collects these in one place.
-
If element’s namespace is HTML and the local name is
"template"
:-
Run the steps of the sanitize a document fragment algorithm on element’s template contents attribute.
-
Drop all child nodes of element.
-
-
If element’s namespace is HTML and the local name is one of
"a"
or"area"
, and if element’sprotocol
property is "javascript:":-
Remove the
href
attribute from element.
-
-
If element’s namespace is HTML and the local name is
"form"
and if element’saction
attribute is a [URL] withjavascript:
protocol:-
Remove the
action
attribute from element.
-
-
If element’s namespace is HTML and the local name is
"input"
or"button"
, and if element’sformaction
attribute is a [URL] withjavascript:
protocol-
Remove the
formaction
attribute from element.
-
Export and refer funky element properties more precisely. [Issue #154]
3.4. Matching Against The Configuration
A sanitize action is keep
, drop
, or block
.
SanitizerConfig
config, run these steps:
-
Let kind be element’s element kind.
-
If kind is
regular
and element does not match any name in the baseline element allow list: Returndrop
. -
If kind is
custom
and if config["allowCustomElements
"] does not exist or if config["allowCustomElements
"] isfalse
: Returndrop
. -
If kind is
unknown
and if config["allowUnknownMarkup
"] does not exist or it config["allowUnknownMarkup
"] isfalse
: Returndrop
. -
If element matches any name in config["
dropElements
"]: Returndrop
. -
If element matches any name in config["
blockElements
"]: Returnblock
. -
Let allow list be null.
-
If "
allowElements
" exists in config:-
Then : Set allow list to config["
allowElements
"]. -
Otherwise: Set allow list to the default configuration's element allow list.
-
-
If element does not match any name in allow list: Return
block
. -
Return
keep
.
-
If element is in the HTML namespace and if element’s local name is identical to name: Return
true
. -
Return
false
.
Whitespaces or colons? [Issue #146]
-
If attribute’s namespace is not
null
: Returnfalse
. -
If attribute’s local name does not match the attribute match list list’s key and if the key is not
"*"
: Returnfalse
. -
Let element be the attribute’s
Element
. -
Let element name be element’s local name.
-
If list’s value does not contain element name and value is not
["*"]
: Returnfalse
. -
Return
true
.
-
Let kind be attribute’s attribute kind.
-
If kind is
unknown
and if config["allowUnknownMarkup
"] does not exist or it config["allowUnknownMarkup
"] isfalse
: Returndrop
. -
If kind is
regular
and attribute’s local name does not match any name in the baseline attribute allow list: Returndrop
. -
If attribute matches any attribute match list in config’s attribute drop list: Return
drop
. -
If attribute allow list exists in config:
-
Then let allow list be
|config|["allowAttributes"]
. -
Otherwise: Let allow list be the default configuration's attribute allow list.
-
-
If attribute does not match any attribute match list in allow list: Return
drop
. -
Return
keep
.
regular
, unknown
,
or custom
. Let element kind be:
-
custom
, if element’s local name is a valid custom element name, -
unknown
, if element is not in the [HTML] namespace or if element’s local name denotes an unknown element — that is, if the element interface the [HTML] specification assigns to it would beHTMLUnknownElement
,
We do not want to use the interface (e.g., "applet" and "blink" are HTMLUnknownElement) [Issue #147]
-
regular
, otherwise.
regular
or unknown
. Let attribute kind be:
-
unknown
, if the [HTML] specification does not assign any meaning to attribute’s name.
Again, this needs to be more specific. Historical, obsolete, conforming, non-conforming (e.g. bgcolor). It is desirable we make a sanitizer-specific list. [Issue #147]
-
regular
, otherwise.
3.5. Baseline and Defaults
The sanitizer baseline and defaults need to be carefully vetted, and are still under discussion. The values below are for illustrative purposes only.
The sanitizer has a built-in default configuration, which is stricter than the baseline and aims to eliminate any script-injection possibility, as well as legacy or unusual constructs.
The defaults and baseline are defined by three JSON constants, baseline element allow list, baseline attribute allow list, default configuration. For better readability, these have been moved to an appendix A.
4. Security Considerations
The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting by traversing a supplied HTML content and removing elements and attributes according to a configuration. The specified API must not support the construction of a Sanitizer object that leaves script-capable markup in and doing so would be a bug in the threat model.
That being said, there are security issues which the correct usage of the Sanitizer API will not be able to protect against and the scenarios will be laid out in the following sections.
4.1. Server-Side Reflected and Stored XSS
This section is not normative.
The Sanitizer API operates solely in the DOM and adds a capability to traverse and filter an existing DocumentFragment. The Sanitizer does not address server-side reflected or stored XSS.
4.2. DOM clobbering
This section is not normative.
DOM clobbering describes an attack in which malicious HTML confuses an
application by naming elements through id
or name
attributes such that
properties like children
of an HTML element in the DOM are overshadowed by
the malicious content.
The Sanitizer API does not protect DOM clobbering attacks in its
default state, but can be configured to remove id
and name
attributes.
4.3. XSS with Script gadgets
This section is not normative.
Script gadgets are a technique in which an attacker uses existing application code from popular JavaScript libraries to cause their own code to execute. This is often done by injecting innocent-looking code or seemingly inert DOM nodes that is only parsed and interpreted by a framework which then performs the execution of JavaScript based on that input.
The Sanitizer API can not prevent these attacks, but requires page authors to
explicitly allow unknown elements in general, and authors must additionally
explicitly configure unknown attributes and elements and markup that is known
to be widely used for templating and framework-specific code,
like data-
and slot
attributes and elements like <slot>
and <template>
.
We believe that these restrictions are not exhaustive and encourage page
authors to examine their third party libraries for this behavior.
4.4. Mutated XSS
This section is not normative.
Mutated XSS or mXSS describes an attack based on parser context mismatches when parsing an HTML snippet without the correct context. In particular, when a parsed HTML fragment has been serialized to a string, the string is not guaranteed to be parsed and interpreted exactly the same when inserted into a different parent element. An example for carrying out such an attack is by relying on the change of parsing behavior for foreign content or misnested tags.
The Sanitizer API offers help against Mutated XSS, but relies on some amount of
cooperation by the developers. The sanitize()
function does not handle strings
and is therefore unaffected. The setHTML
function combines sanitization
with DOM modification and can implicitly apply the correct context. The sanitizeFor()
function combines parsing and sanitization, and relies on the
developer to supply the correct context for the eventual application of its
result.
If the data to be sanitized is available as a node tree, we encourage authors
to use the sanitize()
function of the API which returns a
DocumentFragment and avoids risks that come with serialization and additional
parsing. Directly operating on a fragment after sanitization also comes with a
performance benefit, as the cost of additional serialization and parsing is
avoided.
A more complete treatement of mXSS can be found in [MXSS].
5. Acknowledgements
Cure53’s [DOMPURIFY] is a clear inspiration for the API this document
describes, as is Internet Explorer’s window.toStaticHTML()
.
Appendix A: Built-in Constants
This appendix is normative, except where explicitly noted otherwise.
These constants define core behaviour of the Sanitizer algorithm.
Built-ins Justification
This subsection is super duper non-normative.
Note: The normative values of these constants are found below. The derivation of these are explained here, with an implementation in the [DEFAULTS] script. It is expected that these values will change before this specification is finalized. Also, we expect these to be updated to include additional HTML elements as they are introduced in user agents.
For the purpose of this Sanitizer API, [HTML] constructs fall into one of four classes, where the first defines the baseline, and the first, second, plus the third define the default:
-
Elements and attributes that (directly) execute script. In other words, elements and attributes that are unconditionally script-ish.
-
Legacy and "difficult" elements and attributes. Examples are the
<plaintext>
<xmp>
and elements, which have special parsing rules attached to them. These are not dangerous _per se_, but they have contributed to existing vulnerability. -
Elements and attributes that we feel rarely make sense in user-supplied content.
-
All the rest.
Specifically:
-
Script-ish constructs:
-
The
HTMLScriptElement
, which proudly executes script as its sole purpose. -
All event handler attributes, since these also execute script.
-
HTMLIFrameElement
, which loads arbitrary HTML content and therefor also script. -
The legacy
HTMLObjectElement
andHTMLEmbedElement
, which load non-HTML active content. Also,<object>
's side-kickHTMLParamElement
. -
The no-longer conforming
<frame>
,<frameset>
, and<applet>
tags, which are outdated versions companions of several elements listed above. -
The
<noscript>
,<noframes>
,<noembed>
, and<nolayer>
elements. These, by themselves, are arguably not script-ish, but they are companions to elements listed above, and make no sense on their own. -
Also, the
HTMLBaseElement
, as this effectively modifies interpretation of other URLs.
-
-
Legacy and "difficult" elements.
-
Special parsing behaviour. This is not dangerous in its own right, but has contributed to mXSS-style attacks. This includes:
-
<plaintext>
(Which parses in PLAINTEXT state.) -
<title>
and<textarea>
(Which parse in RCDATA state.) -
The non-conforming [
<xmp>
](https://html.spec.whatwg.org/#xmp) element.
-
-
Legacy elements:
-
<image>
([which is parsed as<img>
](https://html.spec.whatwg.org/#parsing-main-inbody)). -
<basefont>
-
-
-
Constructs unlikely to be beneficial in user-supplied content:
-
The
HTMLTemplateElement
, which introduces a new template to be used by JavaScript, and itsHTMLSlotElement
accomplice. -
The frame-like HTMLPortalElement.
-
The (deprecated) allowpaymentrequest attribute.
-
The Baseline Element Allow List
The built-in baseline element allow list has the following value:
[ "a" , "abbr" , "acronym" , "address" , "area" , "article" , "aside" , "audio" , "b" , "basefont" , "bdi" , "bdo" , "bgsound" , "big" , "blockquote" , "body" , "br" , "button" , "canvas" , "caption" , "center" , "cite" , "code" , "col" , "colgroup" , "command" , "data" , "datalist" , "dd" , "del" , "details" , "dfn" , "dialog" , "dir" , "div" , "dl" , "dt" , "em" , "fieldset" , "figcaption" , "figure" , "font" , "footer" , "form" , "h1" , "h2" , "h3" , "h4" , "h5" , "h6" , "head" , "header" , "hgroup" , "hr" , "html" , "i" , "image" , "img" , "input" , "ins" , "kbd" , "keygen" , "label" , "layer" , "legend" , "li" , "link" , "listing" , "main" , "map" , "mark" , "marquee" , "menu" , "meta" , "meter" , "nav" , "nobr" , "ol" , "optgroup" , "option" , "output" , "p" , "picture" , "plaintext" , "popup" , "portal" , "pre" , "progress" , "q" , "rb" , "rp" , "rt" , "rtc" , "ruby" , "s" , "samp" , "section" , "select" , "selectmenu" , "slot" , "small" , "source" , "span" , "strike" , "strong" , "style" , "sub" , "summary" , "sup" , "table" , "tbody" , "td" , "template" , "textarea" , "tfoot" , "th" , "thead" , "time" , "title" , "tr" , "track" , "tt" , "u" , "ul" , "var" , "video" , "wbr" , "xmp" ]
The Baseline Attribute Allow List
The baseline attribute allow list has the following value:
[ "abbr" , "accept" , "accept-charset" , "accesskey" , "action" , "align" , "alink" , "allow" , "allowfullscreen" , "allowpaymentrequest" , "alt" , "anchor" , "archive" , "as" , "async" , "autocapitalize" , "autocomplete" , "autocorrect" , "autofocus" , "autopictureinpicture" , "autoplay" , "axis" , "background" , "behavior" , "bgcolor" , "border" , "bordercolor" , "capture" , "cellpadding" , "cellspacing" , "challenge" , "char" , "charoff" , "charset" , "checked" , "cite" , "class" , "classid" , "clear" , "code" , "codebase" , "codetype" , "color" , "cols" , "colspan" , "compact" , "content" , "contenteditable" , "controls" , "controlslist" , "conversiondestination" , "coords" , "crossorigin" , "csp" , "data" , "datetime" , "declare" , "decoding" , "default" , "defer" , "dir" , "direction" , "dirname" , "disabled" , "disablepictureinpicture" , "disableremoteplayback" , "disallowdocumentaccess" , "download" , "draggable" , "elementtiming" , "enctype" , "end" , "enterkeyhint" , "event" , "exportparts" , "face" , "for" , "form" , "formaction" , "formenctype" , "formmethod" , "formnovalidate" , "formtarget" , "frame" , "frameborder" , "headers" , "height" , "hidden" , "high" , "href" , "hreflang" , "hreftranslate" , "hspace" , "http-equiv" , "id" , "imagesizes" , "imagesrcset" , "importance" , "impressiondata" , "impressionexpiry" , "incremental" , "inert" , "inputmode" , "integrity" , "invisible" , "is" , "ismap" , "keytype" , "kind" , "label" , "lang" , "language" , "latencyhint" , "leftmargin" , "link" , "list" , "loading" , "longdesc" , "loop" , "low" , "lowsrc" , "manifest" , "marginheight" , "marginwidth" , "max" , "maxlength" , "mayscript" , "media" , "method" , "min" , "minlength" , "multiple" , "muted" , "name" , "nohref" , "nomodule" , "nonce" , "noresize" , "noshade" , "novalidate" , "nowrap" , "object" , "open" , "optimum" , "part" , "pattern" , "ping" , "placeholder" , "playsinline" , "policy" , "poster" , "preload" , "pseudo" , "readonly" , "referrerpolicy" , "rel" , "reportingorigin" , "required" , "resources" , "rev" , "reversed" , "role" , "rows" , "rowspan" , "rules" , "sandbox" , "scheme" , "scope" , "scopes" , "scrollamount" , "scrolldelay" , "scrolling" , "select" , "selected" , "shadowroot" , "shadowrootdelegatesfocus" , "shape" , "size" , "sizes" , "slot" , "span" , "spellcheck" , "src" , "srcdoc" , "srclang" , "srcset" , "standby" , "start" , "step" , "style" , "summary" , "tabindex" , "target" , "text" , "title" , "topmargin" , "translate" , "truespeed" , "trusttoken" , "type" , "usemap" , "valign" , "value" , "valuetype" , "version" , "virtualkeyboardpolicy" , "vlink" , "vspace" , "webkitdirectory" , "width" , "wrap" ]
The Default Configuration Dictionary
The built-in default configuration has the following value:
{ "allowCustomElements" : false , "allowUnknownMarkup" : false , "allowElements" : [ "a" , "abbr" , "acronym" , "address" , "area" , "article" , "aside" , "audio" , "b" , "bdi" , "bdo" , "bgsound" , "big" , "blockquote" , "body" , "br" , "button" , "canvas" , "caption" , "center" , "cite" , "code" , "col" , "colgroup" , "datalist" , "dd" , "del" , "details" , "dfn" , "dialog" , "dir" , "div" , "dl" , "dt" , "em" , "fieldset" , "figcaption" , "figure" , "font" , "footer" , "form" , "h1" , "h2" , "h3" , "h4" , "h5" , "h6" , "head" , "header" , "hgroup" , "hr" , "html" , "i" , "img" , "input" , "ins" , "kbd" , "keygen" , "label" , "layer" , "legend" , "li" , "link" , "listing" , "main" , "map" , "mark" , "marquee" , "menu" , "meta" , "meter" , "nav" , "nobr" , "ol" , "optgroup" , "option" , "output" , "p" , "picture" , "popup" , "pre" , "progress" , "q" , "rb" , "rp" , "rt" , "rtc" , "ruby" , "s" , "samp" , "section" , "select" , "selectmenu" , "small" , "source" , "span" , "strike" , "strong" , "style" , "sub" , "summary" , "sup" , "table" , "tbody" , "td" , "tfoot" , "th" , "thead" , "time" , "tr" , "track" , "tt" , "u" , "ul" , "var" , "video" , "wbr" ], "allowAttributes" : { "abbr" : [ "*" ], "accept" : [ "*" ], "accept-charset" : [ "*" ], "accesskey" : [ "*" ], "action" : [ "*" ], "align" : [ "*" ], "alink" : [ "*" ], "allow" : [ "*" ], "allowfullscreen" : [ "*" ], "alt" : [ "*" ], "anchor" : [ "*" ], "archive" : [ "*" ], "as" : [ "*" ], "async" : [ "*" ], "autocapitalize" : [ "*" ], "autocomplete" : [ "*" ], "autocorrect" : [ "*" ], "autofocus" : [ "*" ], "autopictureinpicture" : [ "*" ], "autoplay" : [ "*" ], "axis" : [ "*" ], "background" : [ "*" ], "behavior" : [ "*" ], "bgcolor" : [ "*" ], "border" : [ "*" ], "bordercolor" : [ "*" ], "capture" : [ "*" ], "cellpadding" : [ "*" ], "cellspacing" : [ "*" ], "challenge" : [ "*" ], "char" : [ "*" ], "charoff" : [ "*" ], "charset" : [ "*" ], "checked" : [ "*" ], "cite" : [ "*" ], "class" : [ "*" ], "classid" : [ "*" ], "clear" : [ "*" ], "code" : [ "*" ], "codebase" : [ "*" ], "codetype" : [ "*" ], "color" : [ "*" ], "cols" : [ "*" ], "colspan" : [ "*" ], "compact" : [ "*" ], "content" : [ "*" ], "contenteditable" : [ "*" ], "controls" : [ "*" ], "controlslist" : [ "*" ], "conversiondestination" : [ "*" ], "coords" : [ "*" ], "crossorigin" : [ "*" ], "csp" : [ "*" ], "data" : [ "*" ], "datetime" : [ "*" ], "declare" : [ "*" ], "decoding" : [ "*" ], "default" : [ "*" ], "defer" : [ "*" ], "dir" : [ "*" ], "direction" : [ "*" ], "dirname" : [ "*" ], "disabled" : [ "*" ], "disablepictureinpicture" : [ "*" ], "disableremoteplayback" : [ "*" ], "disallowdocumentaccess" : [ "*" ], "download" : [ "*" ], "draggable" : [ "*" ], "elementtiming" : [ "*" ], "enctype" : [ "*" ], "end" : [ "*" ], "enterkeyhint" : [ "*" ], "event" : [ "*" ], "exportparts" : [ "*" ], "face" : [ "*" ], "for" : [ "*" ], "form" : [ "*" ], "formaction" : [ "*" ], "formenctype" : [ "*" ], "formmethod" : [ "*" ], "formnovalidate" : [ "*" ], "formtarget" : [ "*" ], "frame" : [ "*" ], "frameborder" : [ "*" ], "headers" : [ "*" ], "height" : [ "*" ], "hidden" : [ "*" ], "high" : [ "*" ], "href" : [ "*" ], "hreflang" : [ "*" ], "hreftranslate" : [ "*" ], "hspace" : [ "*" ], "http-equiv" : [ "*" ], "id" : [ "*" ], "imagesizes" : [ "*" ], "imagesrcset" : [ "*" ], "importance" : [ "*" ], "impressiondata" : [ "*" ], "impressionexpiry" : [ "*" ], "incremental" : [ "*" ], "inert" : [ "*" ], "inputmode" : [ "*" ], "integrity" : [ "*" ], "invisible" : [ "*" ], "is" : [ "*" ], "ismap" : [ "*" ], "keytype" : [ "*" ], "kind" : [ "*" ], "label" : [ "*" ], "lang" : [ "*" ], "language" : [ "*" ], "latencyhint" : [ "*" ], "leftmargin" : [ "*" ], "link" : [ "*" ], "list" : [ "*" ], "loading" : [ "*" ], "longdesc" : [ "*" ], "loop" : [ "*" ], "low" : [ "*" ], "lowsrc" : [ "*" ], "manifest" : [ "*" ], "marginheight" : [ "*" ], "marginwidth" : [ "*" ], "max" : [ "*" ], "maxlength" : [ "*" ], "mayscript" : [ "*" ], "media" : [ "*" ], "method" : [ "*" ], "min" : [ "*" ], "minlength" : [ "*" ], "multiple" : [ "*" ], "muted" : [ "*" ], "name" : [ "*" ], "nohref" : [ "*" ], "nomodule" : [ "*" ], "nonce" : [ "*" ], "noresize" : [ "*" ], "noshade" : [ "*" ], "novalidate" : [ "*" ], "nowrap" : [ "*" ], "object" : [ "*" ], "open" : [ "*" ], "optimum" : [ "*" ], "part" : [ "*" ], "pattern" : [ "*" ], "ping" : [ "*" ], "placeholder" : [ "*" ], "playsinline" : [ "*" ], "policy" : [ "*" ], "poster" : [ "*" ], "preload" : [ "*" ], "pseudo" : [ "*" ], "readonly" : [ "*" ], "referrerpolicy" : [ "*" ], "rel" : [ "*" ], "reportingorigin" : [ "*" ], "required" : [ "*" ], "resources" : [ "*" ], "rev" : [ "*" ], "reversed" : [ "*" ], "role" : [ "*" ], "rows" : [ "*" ], "rowspan" : [ "*" ], "rules" : [ "*" ], "sandbox" : [ "*" ], "scheme" : [ "*" ], "scope" : [ "*" ], "scopes" : [ "*" ], "scrollamount" : [ "*" ], "scrolldelay" : [ "*" ], "scrolling" : [ "*" ], "select" : [ "*" ], "selected" : [ "*" ], "shadowroot" : [ "*" ], "shadowrootdelegatesfocus" : [ "*" ], "shape" : [ "*" ], "size" : [ "*" ], "sizes" : [ "*" ], "slot" : [ "*" ], "span" : [ "*" ], "spellcheck" : [ "*" ], "src" : [ "*" ], "srcdoc" : [ "*" ], "srclang" : [ "*" ], "srcset" : [ "*" ], "standby" : [ "*" ], "start" : [ "*" ], "step" : [ "*" ], "style" : [ "*" ], "summary" : [ "*" ], "tabindex" : [ "*" ], "target" : [ "*" ], "text" : [ "*" ], "title" : [ "*" ], "topmargin" : [ "*" ], "translate" : [ "*" ], "truespeed" : [ "*" ], "trusttoken" : [ "*" ], "type" : [ "*" ], "usemap" : [ "*" ], "valign" : [ "*" ], "value" : [ "*" ], "valuetype" : [ "*" ], "version" : [ "*" ], "virtualkeyboardpolicy" : [ "*" ], "vlink" : [ "*" ], "vspace" : [ "*" ], "webkitdirectory" : [ "*" ], "width" : [ "*" ], "wrap" : [ "*" ] } }