Connection Allowlists

Unofficial Proposal Draft,

This version:
https://wicg.github.io/connection-allowlists/
Issue Tracking:
GitHub
Inline In Spec
Editor:
Mike West (Google)

Abstract

The Connection-Allowlist mechanism provides a concise policy language and delivery mechanism for a set of constraints on a context’s ability to communicate with other servers. The goal is to provide developers with the ability to holistically mitigate explicit exfiltration channels in a way that’s narrowly tailored to suit the problem.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

Developers wish to have control over the resources loaded into their pages' contexts and the endpoints to which their pages can make requests. This control is necessary for several purposes, including limiting the ways in which users' data can flow through the user agent (mitigating exfiltration attacks) and ensuring control over a site’s architecture and depedencies.

Content Security Policy addresses some of this need, but does so in a way that is more granular than necessary for the most critical use cases, and with a syntax and grammar that’s complicated by the other protections CSP is used to deploy. [CSP]

`Connection-Allowlist` steps back from CSP, and focuses on the single use case of controlling the explicit requests a page may initiate through Fetch and other web platform APIs (WebRTC, Web Transport, FedCM, Web Payments, DNS Prefetch, etc) in a way that aims to be straightforward and comprehensive.

NOTE: '\' line wrapping per RFC 8792

Connection-Allowlist: (response-origin "https://cdn.example" "https://*.example.:tld" \
                       "https://api.example:*"); report-to=ReportingAPIEndpoint

This header, delivered along with a document to which a user has navigated, would restrict that document to allow requests and connections only to those endpoints which matched the URL patterns [URLPATTERN] specified in the list: the origin from which the document was delivered, https://cdn.example, ny subdomain of any host whose penultimate DNS label is example, https://api.example on any port, and so on.

Attempts to connect to endpoints that don’t match the allowlist will be blocked, and reported via a Reporting API [REPORTING] endpoint specified in the report-to parameter (and defined through a separate Reporting-Endpoints header).

1.1. Threat Model

This proposal is intentionally small, targeting a specific but useful niche of client-side attacks and/or misconfigurations:

1.2. Overlap with Content Security Policy

This proposal has a lot in common with Content Security Policy’s approach to restrictions upon resource usage within a given context, fetch directives in particular. Still, it seems reasonable to explore for a few reasons:

  1. CSP’s model is too granular: Developers who wish to mitigate the risk that data flows out of a sensitive context require a protection that exhaustively covers the possible ways in which requests can be made or connections established. CSP’s categorization of requests into types which can be controlled in isolation is the wrong way to approach this problem, as data leaking through a request for a web font is just as bad as data leaking through a request for an image or a script. Distinguishing these request types complicates the process of designing a reasonable defense with questions that are simply irrelevant.

  2. CSP’s syntax is not granular enough: The host-source grammar CSP supports leads to truly verbose headers being delivered with responses. A distinct policy provides the opportunity to shift to the URLPattern syntax which will resolve some complaints folks have raised about CSP’s approach by providing a more modern, malleable, and standardized matching syntax.

  3. CSP’s coverage is incomplete: While CSP does a good job covering HTTP requests which run through Fetch, it does not exhaustively cover the myriad ways in which web platform APIs allow connections to be established. DNS prefetch and WebRTC are good examples to start with, but there are many others which have struggled with exactly how they fit into CSP’s threat model. By creating a new policy with a narrow focus and explicit promise to developers, these discussions will have a defensible answer and a clear mandate to specification authors.

2. Connection Allowlists

A Connection Allowlist represents the set of URL patterns to which a given context is allowed to connect. It is a struct with the following items:

3. Connection Allowlist Headers

The Connection-Allowlist response header contains a list of serialized URL pattern strings that define the set of endpoints to which a context is allowed to connect. This allowlist is enforced for a given context, blocking outgoing connections that don’t match the asserted patterns. The Connection-Allowlist-Report-Only response header is a report-only variant, parsed in the same way, but only sending violation reports without blocking outgoing connections.

These Connection Allowlist headers are structured headers whose values are a list of inner lists. Servers may deliver a list with an arbitrary number of items, but only the first will be used. Any additional items in the list will be ignored.

The inner list can contain either URL Patterns serialized as strings, or the token response-origin which represents a pattern matching the response’s URL’s origin. Unexpected values will be ignored.

The inner list may have arbitrary parameters. The report-to parameter’s value will be parsed as a token representing a Reporting API endpoint [REPORTING]. All other parameters will be ignored.

3.1. Parsing

To parse a response’s Connection Allowlists given a response (response):
  1. Let allowlists be an empty list.

  2. Let header be the result of getting a structured field value named `Connection-Allowlist` as a list from response’s header list.

  3. Parse a Connection Allowlist header given header, response’s URL, and enforce. If the result is not null, insert it into allowlists.

  4. Let header be the result of getting a structured field value named `Connection-Allowlist-Report-Only` as a list from response’s header list.

  5. Parse a Connection Allowlist header given header, response’s URL, and report. If the result is not null, insert it into allowlists.

  6. Return allowlists.

To parse a Connection Allowlist header given a structured header list (list), a URL (response-url), and a disposition (disposition):
  1. If list’s size is 0, return null.

  2. If list[0] is not an inner list, return null.

  3. Let allowlist be a Connection Allowlist whose disposition is disposition.

  4. For each item in list[0]:

    1. Let serialized pattern be null.

    2. If item is the token response-origin:

      1. Set serialized pattern to the ASCII serialization of response-url’s origin.

    3. If item is a string, set serialized pattern to item.

    4. If serialized pattern is null, continue.

    5. Let URL pattern be the result of executing build a URL pattern from an HTTP structured field value given serialized pattern with null as the base URL.

      If this step throws an error, continue.

    6. Append URL pattern to allowlist’s allowlist.

  5. For each keyvalue in list[0]'s parameters:

    1. If key is report-to and value is a token, set allowlist’s reporting endpoint to value.

  6. Return allowlist.

Note: We’re skipping over any invalid input in the parsing algorithm. We could plausibly be more draconian in our parsing, but that would likely limit our future flexibility.

3.2. Matching

Depending on the type of connection being established, we may have a request to work with, we may only have a URL, or we may have even less. dns-prefetch, for example, can only match against a host. The algorithms below spell out how connection allowlist checks work in these scenarios:

To match a URL to a Connection Allowlist given a URL (url) and a connection allowlist (connection allowlist), execute the following steps, which return success or failure.
  1. For each pattern in connection allowlist’s allowlist:

    1. If URL pattern matching given pattern and url does not return null, return success.

  2. Return failure.

To match a host to a Connection Allowlist given a host (host) and a connection allowlist (connection allowlist), execute the following steps, which return success or failure.
  1. For each pattern in connection allowlist’s allowlist:

    1. Let input be a new URLPatternInit dictionary whose hostname is set to pattern’s hostname component.

    2. Let host-only pattern be the result of creating a URL pattern given input, null as the base URL, and an empty map as the options.

    3. Let synthetic url be the result of parsing the concatenation of "https://" and host as a URL.

    4. If URL pattern matching given host-only pattern and synthetic url does not return null, return success.

  2. Return failure.

Note: By creating a new pattern with only the hostname component and synthesizing a URL for host, we’re able to return a match if _any_ pattern in the allowlist could allow a request to that host using any protocol, on any port, with any path, and so on.

The should url be blocked by Connection Allowlist algorithm takes a URL (url), an environment (environment), and a list of connection allowlists (connection allowlists). It returns either allowed or blocked:
  1. For each connection allowlist in connection allowlists:

    1. If url matches connection allowlist’s allowlist, continue.

    2. Report a violation given url, environment, and connection allowlist.

    3. If connection allowlist’s disposition is enforce, return blocked.

  2. Return allowed.

The should request be blocked by Connection Allowlists algorithm takes a request (request), and returns either allowed or blocked:
  1. If request’s URL list’s size is greater than 1, return blocked.

    See the open question below in § 5.3 Redirects.

  2. Return the result of executing should url be blocked by Connection Allowlist given request’s url, request’s client, and request’s policy container’s connection allowlists.

The should host be blocked by Connection Allowlists algorithm takes a host (host), an environment (environment), and a list of connection allowlists (connection allowlists). It returns either allowed or blocked:
  1. For each connection allowlist in connection allowlists:

    1. If host host-matches connection allowlist, continue.

    2. Report a violation given host, environment, and connection allowlist.

    3. If connection allowlist’s disposition is enforce, return blocked.

  2. Return allowed.

3.3. Reporting

Like other policy mechanisms, Connection Allowlists will report each violation to a Reporting API endpoint specified in the allowlist headers. Violations are represented by the following dictionary type:

enum ConnectionAllowlistDisposition { "enforce", "report" };

dictionary ConnectionAllowlistViolationReport : ReportBody {
  USVString url;
  USVString connection;
  sequence<DOMString> allowlist;
  ConnectionAllowlistDisposition disposition;
};

ConnectionAllowlistViolationReport’s connection is the serialized URL of the connection which violated the allowlist.

ConnectionAllowlistViolationReport’s allowlist is the allowlist which was violated.

ConnectionAllowlistViolationReport’s disposition is the allowlist’s disposition.

To report a violation given a URL (resource URL), an environment (environment), and a connection allowlist (allowlist):
  1. If allowlist’s reporting endpoint is null, return.

  2. Let violation be a new ConnectionAllowlistViolationReport, initialized as follows:

connection

resource URL, stripped for use in reports.

Note: Because we block redirects, we don’t need to worry about url vs current url. When we go back on that decision (see § 5.3 Redirects), we’ll want to ensure we use url to avoid leaking more information than necessary about redirect targets.

allowlist

allowlist’s allowlist

disposition

allowlist’s disposition.

  1. Generate and queue a report given environment as the context, "connection-allowlist" as the type, allowlist’s reporting endpoint as the destination, and violation as the data.

4. Monkey-Patches

4.1. Integration with Fetch

We’ll handle requests by adding a blocking check in Fetch § 4.1 Main fetch alongside other checks that serve the same purpose:

In Main Fetch, we’ll adjust step 7 as follows:
  1. If should request be blocked due to a bad port, should fetching request be blocked as mixed content, should request be blocked by Content Security Policy, should request be blocked by Connection Allowlists, or should request be blocked by Integrity Policy Policy returns blocked, then set response to a network error.

Fetch also defines algorithms at a lower level which are used to establish connections for APIs which aren’t based on requests. We’ll hook into resolve an origin and obtain a connection to handle things like DNS prefetch, Web Transport, etc:

In resolve an origin, we’ll call out to the host-only matching algorithm above to determine whether any pattern could potentially allow a connection to a given host. If not, we’ll fail resolution.
  1. If should host be blocked by Connection Allowlists returns blocked when executed upon origin’s host, environment, and allowlists, then return failure.
In obtain a connection, we’ll add a check before the current step 2:
  1. If should url be blocked by Connection Allowlist returns blocked when executed upon url, environment, and allowlists, then return failure.

The changes to Fetch will require us to pass additional information into low-level algorithms' callsites to identify the allowlist which ought to be used and the context to be used for reporting. It might be better to instead ask those callsites to perform the checks themselves. My feeling is that we’ll be more successful by centralizing the logic, but it might be simpler to take a piecemeal approach.

4.2. Integration with HTML

To integrate the above into HTML, we’ll add a new connection allowlists item to the policy container struct, containing a list of connection allowlists. This will be populated by adding a step to the create a policy container from a fetch response algorithm:

  1. Parse Integrity-Policy headers with response and result.

  2. Set result’s connection allowlists to the result of parsing a response’s Connection Allowlists given response.
  3. Return result.

4.3. Integration with WebRTC

I need to read more, as I have no idea how any of this works from a spec perspective. :)

5. Security Considerations

5.1. Same-Origin Contexts

The threat model described in § 1.1 Threat Model is intentionally narrow, and developers will need to carefully consider how to layer the allowlisting mechanism described here into their defenses. Most saliently, the mechanism is context-specific, not origin-wide. This leaves broad opportunity for an attacker with scripting access to bypass a context’s allowlist by finding a same-origin context with lower restrictions. Integration with HTML’s policy container addresses some of those possibilities, but it’s likely that others will exist. Allowlisting the document’s origin (via response-origin or explicitly), reaching up through the frame tree, etc.

There are scenarios in which developers can avoid this risk by sandboxing the allowlisted context away from its normal origin via sandbox attributes or Content Security Policy’s sandbox directive. In those cases, no document will be same-origin, and the boundaries will be easier to hold.

It would also be ideal to give developers control over their dependencies' allowlists to some extent. An opt-in mechanism rooted in something like required document policy or [csp-embedded-enforcement]] might be helpful to explore. [WICG/connection-allowlists Issue #1]

5.2. postMessage(...)

This proposal concerns itself entirely with network connections, which may surprise developers who would expect communication via explicit communication channels like postMessage(message, options), MessageChannel, BroadcastChannel, and so on to be covered. It could make sense to extend the model to include those as well, as they all fit into an origin-based model which could be meaningfully compared against the allowlist.

5.3. Redirects

Currently, we specify that any redirected URL fails. This simplifies the initial proposal for discussion and ensures we don’t leak data, but seems unlikely to satisfy developers with real-world deployment needs. I think we have a few realistic options:

  1. Apply the allowlist to every hop of a redirect chain. This has the advantage of matching CSP’s behavior that developers are already familiar with. It _is_ a cross-origin data leak insofar as it provides insight about another origin’s decisions, which is unfortunate but perhaps unavoidable (and non-unique).

  2. Allow _a specific rule_’s redirect chain to arbitrarily redirect. This narrows the concerns above by forcing developers to annotate the allowlist with their expectations. It might be perfectly acceptable for https://trusted.example/ to redirect users to arbitrary locations, while other endpoints are expected to remain put. Annotating list items should make this kind of distinction possible if necessary (e.g. ("https://trusted.example/";redirection-allowed "https://less-so.example/")).

  3. Narrow the above by allowing _a specific rule_ to redirect so long as the targets match the allowlist. This creates less opportunity for unexpected connection than 1 or 2 by requiring developers to annotate the specific rules which can redirect, but would do so in a way that’s less broad (e.g. ("https://semi-trusted.example/";redirection-allowed=within-allowlist ...)).

We could add more options as well. CSP’s earlier navigate-to proposal distinguished between intermediate redirects and the final, non-redirect response. You could imagine adding those kinds of options either to the entire allowlist or individual rules. Feedback here as well would be much appreciated.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSP]
Mike West; Antonio Sartori. Content Security Policy Level 3. URL: https://w3c.github.io/webappsec-csp/
[DOCUMENT-POLICY]
Document Policy. Draft Community Group Report. URL: https://wicg.github.io/document-policy/
[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[REPORTING]
Douglas Creager; Ian Clelland; Mike West. Reporting API. URL: https://w3c.github.io/reporting/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[RFC9651]
M. Nottingham; P-H. Kamp. Structured Field Values for HTTP. September 2024. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc9651
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/
[URLPATTERN]
Ben Kelly; Jeremy Roman; 宍戸俊哉 (Shunya Shishido). URL Pattern Standard. Living Standard. URL: https://urlpattern.spec.whatwg.org/
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/
[XHR]
Anne van Kesteren. XMLHttpRequest Standard. Living Standard. URL: https://xhr.spec.whatwg.org/

Informative References

[CSP-EMBEDDED-ENFORCEMENT]
Mike West. Content Security Policy: Embedded Enforcement. URL: https://w3c.github.io/webappsec-cspee/

IDL Index

enum ConnectionAllowlistDisposition { "enforce", "report" };

dictionary ConnectionAllowlistViolationReport : ReportBody {
  USVString url;
  USVString connection;
  sequence<DOMString> allowlist;
  ConnectionAllowlistDisposition disposition;
};

Issues Index

See the open question below in § 5.3 Redirects.
The changes to Fetch will require us to pass additional information into low-level algorithms' callsites to identify the allowlist which ought to be used and the context to be used for reporting. It might be better to instead ask those callsites to perform the checks themselves. My feeling is that we’ll be more successful by centralizing the logic, but it might be simpler to take a piecemeal approach.
I need to read more, as I have no idea how any of this works from a spec perspective. :)
It would also be ideal to give developers control over their dependencies' allowlists to some extent. An opt-in mechanism rooted in something like required document policy or [csp-embedded-enforcement]] might be helpful to explore. [WICG/connection-allowlists Issue #1]