User Agent Interaction with Related Website Sets

Draft Community Group Report,

This version:
https://wicg.github.io/first-party-sets/
Issue Tracking:
GitHub
Inline In Spec
Editors:
(Google)
(Google)
(Google)

Abstract

How user agents should integrate with Related Website Sets, a mechanism to declare a collection of related domains as being in a Related Website Set.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is non-normative.

Related Website Sets (“RWS”) provides a framework for developers to declare relationships among sites, to enable limited cross-site cookie access for specific, user-facing purposes. This is facilitated through the use of the Storage Access API.

This document defines how user agents should integrate with the Related Website Sets list. For a canonical reference of the structure of the RWS list and technical validations that are run at time of submission, please see the Related Website Sets Submission Guidelines.

2. Infrastructure

This specification depends on the Infra standard. [INFRA]

3. List consumption

User agents should consume the canonical Related Website Sets list on a regular basis (e.g., every 2 weeks) and ship it to individual clients (e.g. a browser application) as an updateable component.

Can we make a recommendation for an update interval here? [Issue #wicg/first-party-sets#122]

Individual clients must build the list of related website sets on restart, or on start-up, if newly downloaded. Clients must not re-build the list at any other point in time.

The RWS list is a UTF-8 encoded file containing contents parseable as a JSON object, conforming to the [JSON-SCHEMA] described in the Related Website Sets Submission Guidelines.

Note: Conformance to the schema is validated at submission time. Hence, it is not required for the user agent to validate conformance again on the client. The algorithms in this specification describe how user agents should parse the RWS list, and when a particular set should be considered valid from the client’s perspective.

Client-side validation may be needed in cases where the Public Suffix List version differs between the server and client. [Issue #wicg/first-party-sets#125]

To build the list of related website sets from a JSON byte sequence bytes, the user agent should run the following steps:

  1. Let json be the result of parsing JSON bytes to an infra value with bytes.

  2. If json is a parsing exception, or if json is not an ordered map, or if json[“sets”] does not exist, return and optionally retry fetching the list, or perform other error recovery tasks.

  3. For each entry of json:

    1. Let set be a related website set.

    2. If entry[“primary”] does not exist, continue.

The specification currently suggests skipping invalid sets (missing primary entries) instead of rejecting the entire list. However, there may be benefit in a full rejection given that the server is expected to hold valid information at all times. [Issue #wicg/first-party-sets#126]

  1. Set set’s primary to entry[“primary”].

  2. Let ccTLDs be the result of parsing an equivalence map from entry[“ccTLDs”]. If ccTLDs is failure, continue.

  3. Set set’s ccTLDs to ccTLDs.

  4. Let serviceSites be the result of parsing a subset from entry[“serviceSites”]. If the result is failure, continue.

  5. Set set’s serviceSites to serviceSites.

  6. Let associatedSites be the result of parsing a subset from entry[“associatedSites”]. If the result is failure, continue.

  7. Set set’s associatedSites to associatedSites.

  8. Add set to the user agent’s list of related website sets.

User agents may opt to pre-process the list into a different format before delivery to the client, e.g. for optimization reasons, as long as they ensure that the client will eventually hold a valid list of related website sets as defined in this specification.

4. Data Structures

The user agent maintains a global list of related website sets, which is a list of related website sets.

A related website set is a struct with the following items:

primary: A site that represents the set’s primary domain.

ccTLDs: An equivalence map, representing the set’s equivalent country-code top level domains that were specified by the submitter.

associatedSites: A list of sites in the associated subset.

serviceSites: A list of sites in the service subset.

Note: For additional context on the meaning of these fields please refer to the Related Website Sets Submission Guidelines.

An equivalence map is an ordered map from sites to lists of sites.

To parse and validate a site from a string input, run the following steps:

  1. Let url be the result of basic URL parsing input. If the result is failure, return failure.

  2. If url’s scheme is not "https", return failure.

  3. Let site be the result of obtaining a site from url’s origin.

  4. Return site.

To parse a subset from a list input, run the following steps:

  1. Let list be an empty list.

  2. For each item of input:

    1. Let site be the result of parsing and validating a site from item.

    2. If site is failure, return failure.

    3. Add site to list.

  3. Return list.

To parse an equivalence map from an ordered map input, run the following steps:

  1. Let map be an empty equivalence map.

  2. For each keyvalue of input:

    1. Let keySite be the result of parsing and validating a site from key. If the result is failure, return failure.

    2. Let equivalents be an empty list.

    3. For each equivalent in value:

      1. Let equivalentSite be the result of parsing and validating a site from equivalent. If the result is failure, return failure.

      2. Add equivalentSite to equivalents.

    4. Set map[keySite] to equivalents.

  3. Return map.

5. Validating related website set inclusion

Under RWS, a site site1 is considered equivalent to another site site2 given an equivalence map equivalents, if equivalents[site1] contains site2 or equivalents[site2] contains site1.

Should this be renamed to avoid being confused with a mathematical equivalence relation? [Issue #wicg/first-party-sets#123]

To determine the member type of a given site site in a given related website set set, run the following steps:

  1. If site is equivalent to set’s primary given set’s ccTLDs, return “primary”.

  2. For each associatedSite of set’s associatedSites:

    1. If site is equivalent to associatedSite given set’s ccTLDs, return “associated”.

  3. For each serviceSite of set’s serviceSites:

    1. If site is equivalent to serviceSite given set’s ccTLDs, return “service”.

  4. Return “none”.

To find a related website set for a given site site, run the following steps:

  1. For each set of the user agent’s list of related website sets:

    1. Let type be the member type of site in set.

    2. If type is not “none”, return set.

  2. Return null.

Note: The Related Website Sets Submission Guidelines require that each site can only appear in at most one Related Website set, which is validated at submission time. For this reason, user agents do not need to be concerned with the order of the list of related website sets when performing these steps.

Define the limit for associated sites within a single related website set to be an implementation-defined value, which is recommended to be 3.

Note: This limit is used when determining eligibility for an associated site to only consider the sites listed at the top of the associated subset. It is meant to discourage abuse and help users and user agents understand why a particular related website set needs to exist. User agents may choose a different number based on this goal.

A site embeddedSite is eligible for same-party membership when embedded within a site topLevelSite, if the following steps return true:

  1. Let set be the result of finding a related website set for topLevelSite.

  2. If set is null, return false.

  3. Let topLevelType be the member type of topLevelSite in set.

  4. If topLevelType is “associated” and the result of determining eligibility for an associated site given topLevelSite and set is false, return false.

  5. If topLevelType is “service”, return false.

  6. Let type be the member type of embeddedSite in set.

  7. If type is “none”, return false.

  8. If type is “associated”, return the result of determining eligibility for an associated site given embeddedSite and set.

  9. Return true.

To determine eligibility for an associated site given a site site and a related website set set, run the following steps:

  1. If set’s associatedSites does not contain site, return false.

  2. Let index be the index of site in set’s associatedSites.

  3. If index is greater than or equal to the limit for associated sites, return false.

  4. Return true.

A given environment settings object settings is same-party with its top-level embedder, if the following steps return true:

  1. Let topLevelSite be the result of obtaining a site from settingstop-level origin.

  2. Let embeddedSite be the result of obtaining a site from settingsorigin.

  3. Return whether embeddedSite is eligible for same-party membership when embedded within topLevelSite.

A given environment settings object settings and origin origin are same-party in an embedding context, if the following steps return true:

  1. Let topLevelSite be the result of obtaining a site from settingstop-level origin.

  2. Let embeddedSite be the result of obtaining a site from origin.

  3. Return whether embeddedSite is eligible for same-party membership when embedded within topLevelSite.

6. Integration with the Storage Access API

Modify requestStorageAccess() to insert the following steps before step 13.5 (i.e. before requesting permission to use):

  1. Let settings be doc’s relevant settings object.

  2. If settings is same-party with its top-level embedder, the user agent may run process permission state with granted and abort the remaining steps.

Modify requestStorageAccessFor(requestedOrigin) to insert the following steps before step 13.8 (i.e. before requesting permission to use):

  1. Let settings be doc’s relevant settings object.

  2. If settings and requestedOrigin are same-party in an embedding context, the user agent may queue a global task on the permissions task source given global to resolve p and abort the remaining steps.

7. Handling related website set changes

When a site site leaves a related website set as the result of building a new list of related website sets, user agents must ensure that it does not retain any access to data or shared identifiers held by other sites in the related website set by running the following steps:

  1. Assert that site is not an opaque origin.

  2. Let domain be site’s host.

  3. For each origin known to the user agent whose host's registered domain is domain:

    1. Clear cache for origin.

    2. Clear cookies for origin.

    3. Clear DOM-accessible storage for origin.

    4. Let descriptor be a newly-created PermissionDescriptor with name initialized to “storage-access”.

    5. Remove all permission store entries for descriptor, where key[0] is site or key[1] is origin.

    6. Run additional implementation-defined steps to ensure that any web-accessible storage is removed from origin.

This section should provide more details on how user agents can figure out when a site leaves an RWS. [Issue #wicg/first-party-sets#124]

8. Privacy Considerations

8.1. Provide user transparency and control

A user agent that uses RWS to infer the relationship between two sites should ensure that its users are informed about this user agent choice and give users the opportunity to view and control choices made by the user agent.

8.2. Ensure compatibility with non-RWS environments

Some user agents may choose not to support RWS in specific environments (such as Private Browsing Modes), or at all. All user agents and specifications should be mindful of this in their own API integrations and aim to gracefully fall back to a working solution for users and developers.

For providing access to cross-site cookies, this specification aims to ensure compatibility with non-RWS environments through usage of the Storage Access API, which provides developers an interface to handle rejections to the request and gives user agents flexibility to employ mechanisms such as prompts or heuristics as an alternative to RWS.

8.3. Prevent privacy leaks from list changes

Developers may submit changes to their sets to add or remove sites. Since membership in a set could provide access to cross-site cookies via automatic grants of the Storage Access API, we need to pay attention to these transitions so that they don’t link user identities across all the RWSs they’ve historically been in. In particular, we must ensure that a domain cannot transfer a user identifier from one Related Website Set to another when it changes its set membership. While a set member may not always request and be granted access to cross-site cookies, for the sake of simplicity of handling set transitions, we propose to treat such access as always granted.

For this reason, this specification requires user agents to clear any site data and storage-access permissions of a given site when a site is removed from a set, before starting any fetches that rely on those permissions or site data.

Note: Most fetches do not depend on data that needs to be cleared, so user agents are advised to optimize for request latency.

9. Security Considerations

9.1. Avoid weakening new and existing security boundaries

Changes to the web platform that tighten boundaries for increased privacy often have positive effects on security as well. For example, cache partitioning restricts Cache Probing attacks and third-party cookie blocking makes it much harder to perform Cross Site Request Forgery (CSRF) by default. Where user agents intend to use Related Website Sets to replace or extend existing boundaries based on site or origin on the web, it is important to consider not only the effects on privacy, but also on security.

Sites in a common RWS may have greatly varying security requirements, for example, a set could contain a site storing user credentials and another hosting untrusted user data. Even within the same set, sites still rely on cross-site and cross-origin restrictions to stay in control of data exposure. Within reason, it should not be possible for a compromised site in an RWS to affect the integrity of other sites in the set.

This consideration will always involve a necessary trade-off between gains like performance or interoperability and risks for users and sites. User agents should facilitate additional mechanisms such as a per-origin opt-in or opt-out to manage this trade-off.

Acknowledgements

Other members of the W3C Privacy Community Group had previously suggested the use of Storage Access API, or an equivalent API; in place of SameParty cookies. Thanks to @jdcauley (1), @arthuredelstein (2), and @johnwilander (3).

Browser vendors, web developers, and members of the web community provided valuable feedback during this proposal’s incubation in the W3C Privacy Community Group.

This proposal includes significant contributions from previous co-editors, David Benjamin, and Harneet Sidhana.

We are also grateful for contributions from Chris Fredrickson and Shuran Huang.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. Key words for use in RFCs to Indicate Requirement Levels

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CLEAR-SITE-DATA]
Mike West. Clear Site Data. URL: https://w3c.github.io/webappsec-clear-site-data/
[ENCODING]
Anne van Kesteren. Encoding Standard. Living Standard. URL: https://encoding.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[PERMISSIONS]
Marcos Caceres; Mike Taylor. Permissions. URL: https://w3c.github.io/permissions/
[PSL]
Public Suffix List. Mozilla Foundation.
[REQUESTSTORAGEACCESSFOR]
requestStorageAccessFor API. Editor's Draft. URL: https://privacycg.github.io/requestStorageAccessFor/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[STORAGE-ACCESS]
Storage Access API. CG Draft. URL: https://privacycg.github.io/storage-access/
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[CACHE-PROBING]
Cache Probing. URL: https://xsleaks.dev/docs/attacks/cache-probing/
[CSRF]
Cross Site Request Forgery (CSRF). URL: https://owasp.org/www-community/attacks/csrf
[JSON-SCHEMA]
Austin Wright; et al. JSON Schema: A Media Type for Describing JSON Documents. 8 December 2020. Internet-Draft. URL: https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema
[RWS-LIST]
Related Website Sets list. URL: https://github.com/GoogleChrome/first-party-sets/blob/main/first_party_sets.JSON
[SUBMISSION-GUIDELINES]
Related Website Sets Submission Guidelines. URL: https://github.com/GoogleChrome/related-website-sets/blob/main/RWS-Submission_Guidelines.md

Issues Index

Can we make a recommendation for an update interval here? [Issue #wicg/first-party-sets#122]
Client-side validation may be needed in cases where the Public Suffix List version differs between the server and client. [Issue #wicg/first-party-sets#125]
The specification currently suggests skipping invalid sets (missing primary entries) instead of rejecting the entire list. However, there may be benefit in a full rejection given that the server is expected to hold valid information at all times. [Issue #wicg/first-party-sets#126]
Should this be renamed to avoid being confused with a mathematical equivalence relation? [Issue #wicg/first-party-sets#123]
This section should provide more details on how user agents can figure out when a site leaves an RWS. [Issue #wicg/first-party-sets#124]