CSS Parser API

Unofficial Proposal Draft,

This version:
https://drafts.css-houdini.org/css-parser-api/
Feedback:
public-houdini@w3.org with subject line “[css-parser-api] … message topic …” (archives)
Issue Tracking:
GitHub
Inline In Spec
Editors:
Tab Atkins-Bittner
Greg Whitworth

Abstract

An API exposing the CSS parser more directly, for parsing arbitrary CSS-like languages into a mildly typed representation.

Status of this document

1. Introduction

Common data-interchange / parsing formats are very valuable for reducing the learning curve of new languages, as users get to lean on their existing knowledge of the format when authoring and only have to newly learn the specifics of the language. This is why generic parsing formats like XML or JSON have become so popular.

The CSS language could benefit from this same treatment; a number of languages and tools rely on CSS-like syntax to express themselves, but they usually rely on ad-hoc parsing (often regex-based) which can be relatively fragile, and might break with CSS practices in interesting syntax corner cases. Similarly, CSS syntax is increasingly used in places like attribute values (such as the sizes attribute, or most of the SVG presentation attributes), and custom elements wanting to do the same thing similarly have to rely on ad-hoc parsing right now.

To help with these sorts of cases, this spec exposes the [css-syntax-3] parsing algorithms, and represents their results in a mildly-typed representation, simpler and more abstract than what [css-typed-om-1] does for CSS properties.

2. Parsing API

typedef (DOMString or ReadableStream) CSSStringSource;

partial interface CSS {
  Promise<sequence<CSSParserRule>> parseStylesheet(CSSStringSource css, optional CSSParserOptions options);
  Promise<sequence<CSSParserRule>> parseRuleList(CSSStringSource css, optional CSSParserOptions options);
  Promise<CSSParserRule> parseRule(CSSStringSource css, optional CSSParserOptions options);
  Promise<sequence<CSSParserRule>> parseDeclarationList(CSSStringSource css, optional CSSParserOptions options);
  CSSParserDeclaration parseDeclaration(DOMString css, optional CSSParserOptions options);
  CSSParserValue parseValue(DOMString css);
  sequence<CSSParserValue> parseValueList(DOMString css);
  sequence<sequence<CSSParserValue>> parseCommaValueList(DOMString css);
};

dictionary CSSParserOptions {
  object atRules;
  /* dict of at-rule name => at-rule type
     (contains decls or contains qualified rules) */
};

parseCommaValueList() is in Syntax, and thus here, because it’s actually a very common operation. It’s trivial to do yourself (just call parseValueList() and then split into an array on top-level commas), but comma-separated lists are so common that it was worthwhile to improve spec ergonomics by providing a shortcut for that functionality. Is it worth it to provide this to JS as well?

Do we handle comments? Currently I don’t; Syntax by default just drops comments, but allows an impl to preserve information about them if they want. Maybe add an option to preserve comments? If so, they can appear *anywhere*, in any API that returns a sequence.

What do we do if an unknown at-rule (not appearing in the atRules option) shows up in the results? Default to decls or rules? Or treat it more simply as just a token sequence?

Parsing stylesheets/rule lists should definitely be async, because stylesheets can be quite large. Parsing individual properties/value lists should definitely be sync, because they’re small and it would be really annoying. Parsing a single rule, tho, is unclear—is it large enough to be worth making async, or is it too annoying to be worth it?

3. Parser Values

interface CSSParserRule {
  /* Just a superclass. */
};

[Constructor(DOMString name, sequence<CSSParserValue> prelude, optional sequence<CSSParserRule>? body)]
interface CSSParserAtRule : CSSParserRule {
  readonly attribute DOMString name;
  readonly attribute FrozenArray<CSSParserValue> prelude;
  readonly attribute FrozenArray<CSSParserRule>? body;
  /* nullable to handle at-statements */
  stringifier;
};

[Constructor(sequence<CSSParserValue> prelude, optional sequence<CSSParserRule>? body)]
interface CSSParserQualifiedRule : CSSParserRule {
  readonly attribute FrozenArray<CSSParserValue> prelude;
  readonly attribute FrozenArray<CSSParserRule> body;
  stringifier;
};

[Constructor(DOMString name, optional sequence<CSSParserRule> body)]
interface CSSParserDeclaration : CSSParserRule {
  readonly attribute DOMString name;
  readonly attribute FrozenArray<CSSParserValue> body;
  stringifier;
};

interface CSSParserValue {
  /* Just a superclass. */
};

[Constructor(DOMString name, sequence<CSSParserValue> body)]
interface CSSParserBlock : CSSParserValue {
  readonly attribute DOMString name; /* "[]", "{}", or "()" */
  readonly attribute FrozenArray<CSSParserValue> body;
  stringifier;
};

[Constructor(DOMString name, sequence<sequence<CSSParserValue>> args)]
interface CSSParserFunction : CSSParserValue {
  readonly attribute DOMString name;
  readonly attribute FrozenArray<FrozenArray<CSSParserValue>> args;
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserIdent : CSSParserValue {
  readonly attribute DOMString value;
  stringifier;
};

[Constructor(double value),
 Constructor(DOMString css)]
interface CSSParserNumber : CSSParserValue {
  readonly attribute double value;
  stringifier;
};

[Constructor(double value),
 Constructor(DOMString css)]
interface CSSParserPercentage : CSSParserValue {
  readonly attribute double value;
  stringifier;
};

[Constructor(double value, DOMString type),
 Constructor(DOMString css)]
interface CSSParserDimension : CSSParserValue {
  readonly attribute double value;
  readonly attribute DOMString type;
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserAtKeyword : CSSParserValue {
  readonly attribute DOMString value;
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserHash : CSSParserValue {
  readonly attribute DOMString value;
  /* expose an "is ident" boolean? */
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserString : CSSParserValue {
  readonly attribute DOMString value;
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserChar : CSSParserValue {
  readonly attribute DOMString value;
  /* for all delims, whitespace, and the
     weird Selectors-based tokens
     (split up into the individual chars) */
  stringifier;
};

Some of the CSSParserValue subtypes correspond closely to Typed OM types; in particular, CSSParserIdent and CSSKeywordValue are basically the same thing, as are CSSParserNumber and CSSNumberValue. Is it worthwhile to transplant this hierarchy underneath the CSSStyleValue superclass, and reuse the ones that are reasonable? <https://github.com/WICG/CSS-Parser-API/issues/9>

Trying to be as useful as possible, without exposing so many details that we’re unable to change tokenization in the future. In particular, whitespace, delims, and the weird Selectors tokens all get serialized as individual CSSParserChar "tokens", which should allow us to change the set of Selectors tokens in the future safely. Am I succeeding at this goal?

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Advisements are normative sections styled to evoke special attention and are set apart from other normative text with <strong class="advisement">, like this: UAs MUST provide an accessible alternative.

Conformance classes

Conformance to this specification is defined for three conformance classes:

style sheet
A CSS style sheet.
renderer
A UA that interprets the semantics of a style sheet and renders documents that use them.
authoring tool
A UA that writes a style sheet.

A style sheet is conformant to this specification if all of its statements that use syntax defined in this module are valid according to the generic CSS grammar and the individual grammars of each feature defined in this module.

A renderer is conformant to this specification if, in addition to interpreting the style sheet as defined by the appropriate specifications, it supports all the features defined by this specification by parsing them correctly and rendering the document accordingly. However, the inability of a UA to correctly render a document due to limitations of the device does not make the UA non-conformant. (For example, a UA is not required to render color on a monochrome monitor.)

An authoring tool is conformant to this specification if it writes style sheets that are syntactically correct according to the generic CSS grammar and the individual grammars of each feature in this module, and meet all other conformance requirements of style sheets as described in this module.

Partial implementations

So that authors can exploit the forward-compatible parsing rules to assign fallback values, CSS renderers must treat as invalid (and ignore as appropriate) any at-rules, properties, property values, keywords, and other syntactic constructs for which they have no usable level of support. In particular, user agents must not selectively ignore unsupported component values and honor supported values in a single multi-value property declaration: if any value is considered invalid (as unsupported values must be), CSS requires that the entire declaration be ignored.

Experimental implementations

To avoid clashes with future CSS features, the CSS2.1 specification reserves a prefixed syntax for proprietary and experimental extensions to CSS.

Prior to a specification reaching the Candidate Recommendation stage in the W3C process, all implementations of a CSS feature are considered experimental. The CSS Working Group recommends that implementations use a vendor-prefixed syntax for such features, including those in W3C Working Drafts. This avoids incompatibilities with future changes in the draft.

Non-experimental implementations

Once a specification reaches the Candidate Recommendation stage, non-experimental implementations are possible, and implementors should release an unprefixed implementation of any CR-level feature they can demonstrate to be correctly implemented according to spec.

To establish and maintain the interoperability of CSS across implementations, the CSS Working Group requests that non-experimental CSS renderers submit an implementation report (and, if necessary, the testcases used for that implementation report) to the W3C before releasing an unprefixed implementation of any CSS features. Testcases submitted to W3C are subject to review and correction by the CSS Working Group.

Further information on submitting testcases and implementation reports can be found from on the CSS Working Group’s website at http://www.w3.org/Style/CSS/Test/. Questions should be directed to the public-css-testsuite@w3.org mailing list.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSS-SYNTAX-3]
Tab Atkins Jr.; Simon Sapin. CSS Syntax Module Level 3. 20 February 2014. CR. URL: http://dev.w3.org/csswg/css-syntax/
[CSS-TYPED-OM-1]
Shane Stephens. CSS Typed OM Level 1. 7 June 2016. WD. URL: https://drafts.css-houdini.org/css-typed-om-1/
[CSSOM-1]
Simon Pieters; Glenn Adams. CSS Object Model (CSSOM). 17 March 2016. WD. URL: https://drafts.csswg.org/cssom/
[HTML]
Ian Hickson. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

IDL Index

typedef (DOMString or ReadableStream) CSSStringSource;

partial interface CSS {
  Promise<sequence<CSSParserRule>> parseStylesheet(CSSStringSource css, optional CSSParserOptions options);
  Promise<sequence<CSSParserRule>> parseRuleList(CSSStringSource css, optional CSSParserOptions options);
  Promise<CSSParserRule> parseRule(CSSStringSource css, optional CSSParserOptions options);
  Promise<sequence<CSSParserRule>> parseDeclarationList(CSSStringSource css, optional CSSParserOptions options);
  CSSParserDeclaration parseDeclaration(DOMString css, optional CSSParserOptions options);
  CSSParserValue parseValue(DOMString css);
  sequence<CSSParserValue> parseValueList(DOMString css);
  sequence<sequence<CSSParserValue>> parseCommaValueList(DOMString css);
};

dictionary CSSParserOptions {
  object atRules;
  /* dict of at-rule name => at-rule type
     (contains decls or contains qualified rules) */
};

interface CSSParserRule {
  /* Just a superclass. */
};

[Constructor(DOMString name, sequence<CSSParserValue> prelude, optional sequence<CSSParserRule>? body)]
interface CSSParserAtRule : CSSParserRule {
  readonly attribute DOMString name;
  readonly attribute FrozenArray<CSSParserValue> prelude;
  readonly attribute FrozenArray<CSSParserRule>? body;
  /* nullable to handle at-statements */
  stringifier;
};

[Constructor(sequence<CSSParserValue> prelude, optional sequence<CSSParserRule>? body)]
interface CSSParserQualifiedRule : CSSParserRule {
  readonly attribute FrozenArray<CSSParserValue> prelude;
  readonly attribute FrozenArray<CSSParserRule> body;
  stringifier;
};

[Constructor(DOMString name, optional sequence<CSSParserRule> body)]
interface CSSParserDeclaration : CSSParserRule {
  readonly attribute DOMString name;
  readonly attribute FrozenArray<CSSParserValue> body;
  stringifier;
};

interface CSSParserValue {
  /* Just a superclass. */
};

[Constructor(DOMString name, sequence<CSSParserValue> body)]
interface CSSParserBlock : CSSParserValue {
  readonly attribute DOMString name; /* "[]", "{}", or "()" */
  readonly attribute FrozenArray<CSSParserValue> body;
  stringifier;
};

[Constructor(DOMString name, sequence<sequence<CSSParserValue>> args)]
interface CSSParserFunction : CSSParserValue {
  readonly attribute DOMString name;
  readonly attribute FrozenArray<FrozenArray<CSSParserValue>> args;
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserIdent : CSSParserValue {
  readonly attribute DOMString value;
  stringifier;
};

[Constructor(double value),
 Constructor(DOMString css)]
interface CSSParserNumber : CSSParserValue {
  readonly attribute double value;
  stringifier;
};

[Constructor(double value),
 Constructor(DOMString css)]
interface CSSParserPercentage : CSSParserValue {
  readonly attribute double value;
  stringifier;
};

[Constructor(double value, DOMString type),
 Constructor(DOMString css)]
interface CSSParserDimension : CSSParserValue {
  readonly attribute double value;
  readonly attribute DOMString type;
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserAtKeyword : CSSParserValue {
  readonly attribute DOMString value;
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserHash : CSSParserValue {
  readonly attribute DOMString value;
  /* expose an "is ident" boolean? */
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserString : CSSParserValue {
  readonly attribute DOMString value;
  stringifier;
};

[Constructor(DOMString value)]
interface CSSParserChar : CSSParserValue {
  readonly attribute DOMString value;
  /* for all delims, whitespace, and the
     weird Selectors-based tokens
     (split up into the individual chars) */
  stringifier;
};

Issues Index

parseCommaValueList() is in Syntax, and thus here, because it’s actually a very common operation. It’s trivial to do yourself (just call parseValueList() and then split into an array on top-level commas), but comma-separated lists are so common that it was worthwhile to improve spec ergonomics by providing a shortcut for that functionality. Is it worth it to provide this to JS as well?
Do we handle comments? Currently I don’t; Syntax by default just drops comments, but allows an impl to preserve information about them if they want. Maybe add an option to preserve comments? If so, they can appear *anywhere*, in any API that returns a sequence.
What do we do if an unknown at-rule (not appearing in the atRules option) shows up in the results? Default to decls or rules? Or treat it more simply as just a token sequence?
Parsing stylesheets/rule lists should definitely be async, because stylesheets can be quite large. Parsing individual properties/value lists should definitely be sync, because they’re small and it would be really annoying. Parsing a single rule, tho, is unclear—is it large enough to be worth making async, or is it too annoying to be worth it?
Some of the CSSParserValue subtypes correspond closely to Typed OM types; in particular, CSSParserIdent and CSSKeywordValue are basically the same thing, as are CSSParserNumber and CSSNumberValue. Is it worthwhile to transplant this hierarchy underneath the CSSStyleValue superclass, and reuse the ones that are reasonable? <https://github.com/WICG/CSS-Parser-API/issues/9>
Trying to be as useful as possible, without exposing so many details that we’re unable to change tokenization in the future. In particular, whitespace, delims, and the weird Selectors tokens all get serialized as individual CSSParserChar "tokens", which should allow us to change the set of Selectors tokens in the future safely. Am I succeeding at this goal?