No-Vary-Search

Draft Community Group Report,

This version:
https://wicg.github.io/nav-speculation/no-vary-search.html
Issue Tracking:
GitHub
Editor:
(Google)

Abstract

A proposed HTTP header field for changing how URL search parameters impact caching

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Status and venue note

This document is being written as a web-style specification in the WICG for now, because that’s the tooling and venue the author is familiar with. Its purpose is to nail down some details of the processing model in order to make writing and testing prototypes easier.

In the longer term, we envision this header being specified in a HTTPWG RFC, alongside whatever portion of the processing model can be shared among its various consumers. (That is, between both web platform specifications such as [FETCH], and HTTP specifications such as future modifications to [RFC9111].) It’s just incubating in WICG for now.

2. HTTP header field definition

The `No-Vary-Search` HTTP header field is a structured header whose value must be a dictionary.

TODO: probably give some more introductory non-normative text. Look at what other HTTP field defintions do.

It has the following authoring conformance requirements:

As always, the authoring conformance requirements are not binding on implementations. Implementations instead need to implement the processing model given by the obtain a URL search variance algorithm.

3. Data model

A URL search variance is a struct whose items are the following:

The default URL search variance is a URL search variance whose no-vary params is an empty list, vary params is wildcard, and vary on key order is true.

The obtain a URL search variance algorithm ensures that all URL search variances obey the following constraints:

4. Parsing

To parse a URL search variance given a map value:
  1. If value is null, then return the default URL search variance.

  2. If value’s keys contains anything other than "key-order", "params", or "except", then return the default URL search variance.

  3. Let result be a new URL search variance.

  4. Set result’s vary on key order to true.

  5. If value["key-order"] exists:

    1. If value["key-order"] is not a boolean, then return the default URL search variance.

    2. Set result’s vary on key order to the boolean negation of value["key-order"].

  6. If value["params"] exists:

    1. If value["params"] is a boolean:

      1. If value["params"] is true, then:

        1. Set result’s no-vary params to wildcard.

        2. Set result’s vary params to the empty list.

      2. Otherwise:

        1. Set result’s no-vary params to the empty list.

        2. Set result’s vary params to wildcard.

    2. Otherwise, if value["params"] is a list:

      1. If any item in value["params"] is not a string, then return the default URL search variance.

      2. Set result’s no-vary params to the result of applying parse a key to each item in value["params"].

      3. Set result’s vary params to wildcard.

    3. Otherwise, return the default URL search variance.

  7. If value["except"] exists:

    1. If value["params"] is not true, then return the default URL search variance.

    2. If value["except"] is not a list, then return the default URL search variance.

    3. If any item in value["except"] is not a string, then return the default URL search variance.

    4. Set result’s vary params to the result of applying parse a key to each item in value["except"].

  8. Return result.

In general, this algorithm is strict and tends to return the default URL search variance whenever it sees something it doesn’t recognize. This is because the default URL search variance behavior will just cause fewer cache hits, which is an acceptable fallback behavior.

To obtain a URL search variance given a response response:
  1. Let fieldValue be the result of getting a structured field value given `No-Vary-Search` and "dictionary" from response’s header list.

  2. Return the result of parsing a URL search variance given fieldValue.

The following illustrates how various inputs are parsed, in terms of their impacting on the resulting no-vary params and vary params:
Input Result
No-Vary-Search: params
No-Vary-Search: params=("a")
No-Vary-Search: params, except=("x")
The following inputs are all invalid and will cause the default URL search variance to be returned:
The following inputs are valid, but somewhat unconventional. They are shown alongside their more conventional form.
Input Conventional form
No-Vary-Search: params=?1
No-Vary-Search: params
No-Vary-Search: key-order=?1
No-Vary-Search: key-order
No-Vary-Search: params, key-order, except=("x")
No-Vary-Search: key-order, params, except=("x")
No-Vary-Search: params=?0
(omit the header)
No-Vary-Search: params=()
(omit the header)
No-Vary-Search: key-order=?0
(omit the header)
To obtain a URL search variance hint given a string hintValue:
  1. Let fieldValue be the result of parsing structured fields given hintValue and "dictionary".

  2. If parsing failed, then return the default URL search variance.

  3. Return the result of parsing a URL search variance given fieldValue.

To parse a key given an ASCII string keyString:
  1. Let keyBytes be the isomorphic encoding of keyString.

  2. Replace any 0x2B (+) in keyBytes with 0x20 (SP).

  3. Let keyBytesDecoded be the percent-decoding of keyBytes.

  4. Let keyStringDecoded be the UTF-8 decoding without BOM of keyBytesDecoded.

  5. Return keyStringDecoded.

The parse a key algorithm allows encoding non-ASCII key strings in the ASCII structured header format, similar to how the application/x-www-form-urlencoded format allows encoding an entire entry list of keys and values in ASCII URL format. For example,
No-Vary-Search: params=("%C3%A9+%E6%B0%97")

will result in a URL search variance whose vary params are « "é 気" ». As explained in a later example, the canonicalization process during equivalence testing means this will treat as equivalent URL strings such as:

and so on, since they all are parsed to having the same key "é 気".

5. Comparing

Two URLs urlA and urlB are equivalent modulo search variance given a URL search variance searchVariance if the following algorithm returns true:

  1. If the scheme, username, password, host, port, or path of urlA and urlB differ, then return false.

  2. If searchVariance is equivalent to the default URL search variance, then:

    1. If urlA’s query equals urlB’s query, then return true.

    2. Return false.

    In this case, even URL pairs that might appear the same after running the application/x-www-form-urlencoded parser on their queries, such as https://example.com/a and https://example.com/a?, or https://example.com/foo?a=b&&&c and https://example.com/foo?a=b&c=, will be treated as inequivalent.

  3. Let searchParamsA and searchParamsB be empty lists.

  4. If urlA’s query is not null, then set searchParamsA to the result of running the application/x-www-form-urlencoded parser given the isomorphic encoding of urlA’s query.

  5. If urlB’s query is not null, then set searchParamsB to the result of running the application/x-www-form-urlencoded parser given the isomorphic encoding of urlB’s query.

  6. If searchVariance’s no-vary params is a list, then:

    1. Set searchParamsA to a list containing those items pair in searchParamsA where searchVariance’s no-vary params does not contain pair[0].

    2. Set searchParamsB to a list containing those items pair in searchParamsB where searchVariance’s no-vary params does not contain pair[0].

  7. Otherwise, if searchVariance’s vary params is a list, then:

    1. Set searchParamsA to a list containing those items pair in searchParamsA where searchVariance’s vary params contains pair[0].

    2. Set searchParamsB to a list containing those items pair in searchParamsB where searchVariance’s vary params contains pair[0].

  8. If searchVariance’s vary on key order is false, then:

    1. Let keyLessThan be an algorithm taking as inputs two pairs (keyA, valueA) and (keyB, valueB), which returns whether keyA is code unit less than keyB.

    2. Set searchParamsA to the result of sorting in ascending order searchParamsA, with keyLessThan.

    3. Set searchParamsB to the result of sorting in ascending order searchParamsB, with keyLessThan.

  9. If searchParamsA’s size is not equal to searchParamsB’s size, then return false.

  10. Let i be 0.

  11. While i < searchParamsA’s size:

    1. If searchParamsA[i][0] does not equal searchParamsB[i][0], then return false.

    2. If searchParamsA[i][1] does not equal searchParamsB[i][1], then return false.

    3. Set i to i + 1.

  12. Return true.

Due to how the application/x-www-form-urlencoded parser canonicalizes query strings, there are some cases where query strings which do not appear obviously equivalent, will end up being treated as equivalent after parsing.

So, for example, given any non-default value for No-Vary-Search, such as No-Vary-Search: key-order, we will have the following equivalences:

Equivalent URL strings Explanation
https://example.com/ A null query is parsed the same as an empty string query
https://example.com/?
https://example.com/?a=x Parsing performs percent-decoding
https://example.com/?%61=%78
https://example.com/?a=é Parsing performs percent-decoding
https://example.com/?a=%C3%A9
https://example.com/?a=%f6 Both values are parsed as U+FFFD (�)
https://example.com/?a=%ef%bf%bd
https://example.com/?a=x&&&& Parsing splits on & and discards empty strings
https://example.com/?a=x
https://example.com/?a= Both parse as having an empty string value for a
https://example.com/?a
https://example.com/?a=%20 + and %20 are both parsed as U+0020 SPACE
https://example.com/?a=+
https://example.com/?a= &

6. Security considerations

The main risk to be aware of is the impact of mismatched URLs. In particular, this could cause the user to see a response that was originally fetched from a URL different from the one displayed when they hovered a link, or the URL displayed in the URL bar.

However, since the impact is limited to query parameters, this does not cross the relevant security boundary, which is the origin. (Or perhaps just the host, from the perspective of security UI.) Indeed, we have already given origins complete control over how they present the (URL, reponse body) pair, including on the client side via technology such as history.replaceState() or service workers.

7. Privacy considerations

This proposal is adjacent to the highly-privacy-relevant space of navigational tracking, which often uses query parameters to pass along user identifiers. However, we believe this proposal itself does not have privacy impacts. It does not interfere with existing navigational tracking mitigations, or any known future ones being contemplated. Indeed, if a page were to encode user identifiers in its URL, the only ability this proposal gives is to reduce such user tracking by preventing server processing of such user IDs (since the server is bypassed in favor of the cache).

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[ENCODING]
Anne van Kesteren. Encoding Standard. Living Standard. URL: https://encoding.spec.whatwg.org/
[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[NAV-TRACKING-MITIGATIONS]
Navigational-Tracking Mitigations. Editor's Draft. URL: https://privacycg.github.io/nav-tracking-mitigations/
[RFC8941]
M. Nottingham; P-H. Kamp. Structured Field Values for HTTP. February 2021. Proposed Standard. URL: https://httpwg.org/specs/rfc8941.html
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/

Informative References

[RFC9111]
R. Fielding, Ed.; M. Nottingham, Ed.; J. Reschke, Ed.. HTTP Caching. June 2022. Internet Standard. URL: https://httpwg.org/specs/rfc9111.html