Signature-based Integrity

Unofficial Proposal Draft,

More details about this document
This version:
https://mikewest.github.io/signature-based-sri/
Feedback:
public-webappsec@w3.org with subject line “[signature-based-sri] … message topic …” (archives)
Issue Tracking:
GitHub
Inline In Spec
Editor:
(Google LLC.)
Toggle Diffs:

Abstract

A monkey-patch spec that enhances SRI with signature-based integrity checks. These are conceptually similar to the content-based checks currently defined, but have different properties that seem interesting to explore.

Status of this document

1. Introduction

Subresource Integrity [SRI] defines a mechanism by which developers can ensure that script or stylesheet loaded into their pages' contexts are _exactly_ those scripts or stylesheets the developer expected. By specifying a SHA-256 hash of a resource’s content, any malicious or accidental deviation will be blocked before being executed. This is an excellent defense, but its deployment turns out to be brittle. If the resource living at a specific URL is dynamic, then content-based integrity checks require pages and the resources they depend upon to update in lockstep. This turns out to be ~impossible in practice, which makes SRI less usable than it could be.

Particularly as the industry becomes more interested in supply-chain integrity (see Shopify’s [PCIv4-SRI-Gaps], for instance), it seems reasonable to explore alternatives to static hashes that could allow wider deployment of these checks, and therefore better understanding of the application experiences that developers are _actually_ composing.

This document outlines the changes that would be necessary to [Fetch], and [SRI] in order to support the simplest version of a signature-based check:

Pages will embed an Ed25519 public key assertion into integrity attributes:
<script src="https://my.cdn/script.js"
        crossorigin="anonymous"
        integrity="ed25519-[base64-encoded-public-key]"></script>

Servers will deliver a signature using the corresponding private key along with the resource as an HTTP response header:

HTTP/1.1 200 OK
Accept-Ranges: none
Vary: Accept-Encoding
Content-Type: text/javascript; charset=UTF-8
Access-Control-Allow-Origin: *
Integrity: ed25519-[base64-encoded result of Ed25519(`console.log("Hello, world!");`)]

console.log("Hello, world!");

The user agent will validate the signature using the expected public key before executing the response.

That’s it!

The goal here is to flesh out the proposal for discussion, recognizing that it might be too simple to ship. Then again, it might be _just_ simple enough...

1.1. Signatures are not Hashes

Subresource Integrity’s existing hash-based checks ensure that specific, known _content_ executes. It doesn’t care who made the file or from which server it was retrieved: as long as the content matches the expectation, we’re good to go. This gives developers the ability to ensure that a specific set of audited scripts are the only ones that can execute in their pages, providing a strong defense against some kinds of threats.

The signature-based checks described briefly above are different. Rather than validating that a specific script or stylesheet is known-good, they instead act as a proof of _provenance_ which ensures that scripts will only execute if they’re signed with a known private key. Assuming good key-management practices (easy, right?), this gives a guarantee which is different in kind, but similarly removes the necessity to trust intermediaries.

With these properties in mind, signature-based integrity checks aim to protect against attackers who might be able to manipulate the content of resources that a site depends upon, but who cannot gain access to the signing key.

2. Monkey Patches

Extending SRI to support signatures will require changes to three specifications, along with some additional infrastructure.

2.1. Patches to SRI

At a high level, we’ll make the following changes to SRI:

  1. We’ll define the accepted algorithm values. Currently, these are left up to user agents in order to allow for future flexibility: given that the years since SRI’s introduction have left the set of accepted algorithms and their practical ordering unchanged, we should define that explicitly.

  2. With known algorithms, we can adjust the prioritization model to return a set of the strongest content-based and signature-based algorithms specified in a given element. This would enable developers to specify both a hash and signature expectation for a resource, ensuring both that known resources load, _and_ that they’re accepted by a trusted party.

    This might not be necessary. It allows us to explain things like packaging constraints in ways that seem useful, but does introduce some additional complexity in developers' mental model. So, consider it a decision point.

  3. Finally, we’ll adjust the matching algorithm to correctly handle signatures by passing the public key in to the comparison operation.

The following sections adjust algorithms accordingly.

2.1.1. Parse metadata.

First, we’ll define valid signature algorithms:

Then, we’ll adjust SRI’s Parse metadata. algorithm as follows:

This algorithm accepts a string, and returns a map containing one set of hash expressions whose hash functions are understood by the user agent, and one set of signature expressions which are likewise understood:

  1. Let result be the empty set the ordered map «[ "hashes" → « », "signatures" → « » ]».

  2. For each item returned by splitting metadata on spaces:

    1. Let expression-and-options be the result of splitting item on U+003F (?).

    2. Let algorithm-expression be expression-and-options[0].

    3. Let base64-value be the empty string.

    4. Let algorithm-and-value be the result of splitting algorithm-expression on U+002D (-).

    5. Let algorithm be algorithm-and-value[0].

    6. If algorithm-and-value[1] exists, set base64-value to algorithm-and-value[1].

    7. If algorithm is not a valid SRI hash algorithm token, then continue.
    8. Let metadata data be the ordered map «["alg" → algorithm, "val" → base64-value]».

    9. Append metadata to result.
    10. If algorithm is a valid SRI hash algorithm token, then append data to result["hashes"].
    11. Otherwise, if algorithm is a valid SRI signature algorithm token, then append data to result["signatures"].
  3. Return result.

2.1.2. Do bytes and header list match metadataList?

Since we adjusted the result of § 2.1.1 Parse metadata. above, we need to adjust the matching algorithm to match. The core change will be processing both hashing and signature algorithms: if only one kind is present, the story will be similar to today, and multiple strong algorithms can be present, allowing multiple distinct resources. If both hashing and signature algorithms are present, both will be required to match. This is conceptually similar to the application of multiple Content Security Policies.

In order to validate signatures, we’ll need to change Fetch to pass in the relevant HTTP response header. For the moment, let’s simply pass in the entire header list:

  1. Let parsedMetadata be the result of executing SRI § 3.3.2 Parse metadata on metadataList.

  2. If both parsedMetadata ["hashes"] and |parsedMetadata["signatures"] are empty set, return true.

  3. Let metadata hash-metadata be the result of executing SRI § 3.3.3 Get the strongest metadata from set on parsedMetadata ["hashes"] ..

  4. Let signature-metadata be the result of executing SRI § 3.3.3 Get the strongest metadata from set on parsedMetadata["signatures"].
  5. Let hash-match be true if hash-metadata is empty, and false otherwise.
  6. Let signature-match be true if signature-metadata is empty, and false otherwise.
  7. For each item in metadata hash-metadata :

    1. Let algorithm be the item["alg"].

    2. Let expectedValue be the item["val"].

    3. Let actualValue be the result of SRI § 3.3.1 Apply algorithm to bytes on algorithm and bytes.

    4. If actualValue is a case-sensitive match for expectedValue, return true set hash-match to true and break.

  8. For each item in signature-metadata:
    1. Let algorithm be the item["alg"].
    2. Let public key be the item["val"].
    3. Let result be the result of validating a signature using algorithm over bytes and header list with public key.
    4. If result is true, set signature-match to true and break.
  9. Return false. Return true if both hash-match and signature-match are true. Otherwise return false.

2.1.3. Validate a signature using algorithm over bytes and header list with public key

The matching algorithm above calls into a new signature validation function. Let’s write that down. At core, it will execute the Ed25519 validation steps from [RFC8032], using signatures extracted from an Integrity header that’s defined in § 2.1.4 Integrity Header.

To validate a signature using a string algorithm over a byte sequence bytes, a header list header list, and string public key, execute the following steps. They return valid if the signature is valid, or invalid otherwise.
  1. If algorithm is an ASCII case-insensitive match for "ed25519", then:

    1. Let signatures be the result of getting, decoding, and splitting `Integrity` from header list.

    2. If signatures is null, return invalid.

    3. For each signature in signatures:

      1. Execute the "Verify" steps for Ed25519 as defined in Section 5.1.7 of [RFC8032], using bytes as the message M , public key as the public key A, and signature as the signature.

      2. If the signature is valid, return valid.

    4. Return invalid.

  2. Assert: We won’t reach this step, because ed25519 is the only valid signature algorithm token.

  3. Return invalid.

2.1.4. Integrity Header

Rather than introducing this header, perhaps we could/should reuse the Identity-Digest proposal [ID.pardue-http-identity-digest], along with the Signature and Signature-Input headers from [RFC9421]. That would avoid reinventing the wheel, and seems pretty reasonable. [Issue #16]

The Integrity HTTP response header specifies a integrity metadata for a given response. It is a Structured Header whose value MUST be a list of token [RFC9651].

Valid list values match the hash-expression grammar as defined in [SRI].

A resource might be delivered with an integrity header specifying a signature that can be used to validate the resource’s provenance:
HTTP/1.1 200 OK
Accept-Ranges: none
Vary: Accept-Encoding
Content-Type: text/javascript; charset=UTF-8
Access-Control-Allow-Origin: *
Integrity: ed25519-[base64-encoded Ed25519 signature]

Do we need a mechanism (another header?) allowing the server to specify the public key used to sign the resource? That might allow developers to discover keys for resources more easily, and could be used to reject the resource without validation if we can determine a priori that the keys don’t match...

Would it be useful to extend this header’s behavior to include client-side content validation for hash algorithms? I think it’s arguably outside SRI’s threat model, but you could imagine an attacker that could change content but not headers, which would make enforcement of an Integrity header on the client meaningful for a variety of resources (including top-level documents, which would help provide a web-accessible explanation for some packaging behavior).

That is, a resource delivered with:

Integrity: sha256-[base64’d hash goes here]

Could throw a network error in Fetch if the hash didn’t match the delivered content. Likewise, a resource delivered with:

Integrity: ed25519-[base64’d hash goes here];public-key=[base64’d hash goes here]

Could throw a network error if the delivered signature and public key didn’t validate against the resource’s content.

Or, sites could go crazy and deliver a header containing both:

Integrity: sha256-[base64’d hash goes here],
           ed25519-[base64’d hash goes here];public-key=[base64’d hash goes here]

Which would enforce both constraints.

Not sure it’s a priority, but it might be an interesting primitive to extract from this proposal (especially if we end up adding a streaming hash primitive like [RFC7693] as suggested in issue #104, or its successor, suggested at TPAC in 2024 ).

2.2. Patches to Fetch

The only change we need to make to Fetch is to pass additional information into the matching algorithm as redefined above.

Step 22.3.1 of Fetch § 4.1 Main fetch should be updated as follows:

  1. If bytes do not match request’s integrity metadata and response’s header list , then run processBodyError and abort these steps. [SRI]

3. Deployment Considerations

3.1. Key Management

Key management is hard. This proposal doesn’t change that.

It aims instead to be very lightweight. Perhaps it errs in that direction, but the goal is to be the simplest possible mechanimsm that supports known use-cases.

A different take on this proposal could be arbitrarily complex, replicating aspects of the web PKI to chain trust, allow delegation, etc. That seems like more than we need today, and substantially more work. Perhaps something small is good enough?

3.2. Key Rotation

Since this design relies on websites pinning a specific public key in the integrity attribute, this design does not easily support key rotation. If a signing key is compromised, there is no easy way to rotate the key and ensure that reliant websites check signatures against an updated public key.

For now, we think this is probably enough. If the key is compromised, the security model falls back to the status quo web security model, meaning that the impact of a compromised key is limited. In the future if this does turn out to be a significant issue, we could also explore alternate designs that do support key rotation. One simple proposal could be adding support for the client to signal the requested public key in request headers, allowing different parties to specify different public keys. A more complex proposal could support automated key rotation.

Note: This proposal does support pinning multiple keys for a single resource, so it will be possible to support rotation in a coordinated way without requiring each entity to move in lockstep.

4. Security Considerations

4.1. Secure Contexts

SRI does not require a secure context, nor does it apply only to resources delivered via encrypted and authenticated channels. That means that it’s entirely possible to believe that SRI offers a level of protection that it simply cannot aspire to. Signatures do not change that calculus.

Thus, it remains recommended that developers rely on integrity metadata only within secure contexts. See also [SECURING-WEB].

4.2. Provenance, not Content

Signatures do not provide any assurance that the content delivered is the content a developer expected. They ensure only that the content was signed by the expected entity. This could allow resources signed by the same entity to be substituted for one another in ways that could violate developer expectations.

In some cases, developers can defend against this confusion by using hashes instead of signatures (or, as discussed above, both hashes and signatures). Servers can likewise defend against this risk by minting fresh keys for each interesting resource. This, of course, creates more key-management problems, but it might be a reasonable tradeoff.

4.3. Rollback Attacks

The simple signature checks described in this document only provide proof of provenance, ensuring that a given resource was at one point signed by someone in posession of the relevant private key. It does not say anything about whether that entity intended to deliver a given resource to you now. In other words, these checks do not prevent rollback/downgrade attacks in which old, known-bad versions of a resource might be delivered, along with their known signatures.

This might not be a problem, depending on developers' use cases. If it becomes a problem, it seems possible to add mitigations in the future. These could take various forms, ranging from enforcing freshness by signing additional timestamps through to sending a random challenge along with requests that would be included in the signature.

We’d want to evaluate the tradeoffs in these approaches (the latter, for example, makes offline signing difficult), and might wish to offer serveral options.

5. Privacy Considerations

Given that the validation of a response’s signature continues to require the response to opt-into legibility via CORS, this mechanism does not seem to add any new data channels from the server to the client. The choice of private key used to sign the resource is potentially interesting, but doesn’t seem to offer any capability that isn’t possible more directly by altering the resource body or headers.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[Fetch]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[RFC8032]
S. Josefsson; I. Liusvaara. Edwards-Curve Digital Signature Algorithm (EdDSA). January 2017. Informational. URL: https://www.rfc-editor.org/rfc/rfc8032
[RFC9651]
M. Nottingham; P-H. Kamp. Structured Field Values for HTTP. September 2024. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc9651
[SRI]
Devdatta Akhawe; et al. Subresource Integrity. URL: https://w3c.github.io/webappsec-subresource-integrity/

Informative References

[ID.pardue-http-identity-digest]
Lucas Pardue. HTTP Identity Digest. URL: https://www.ietf.org/archive/id/draft-pardue-http-identity-digest-01.html
[PCIv4-SRI-Gaps]
Yoav Weiss; Ilya Grigorik. PCIv4: SRI gaps and opportunities. URL: https://docs.google.com/document/d/1RcUpbpWPxXTyW0Qwczs9GCTLPD3-LcbbhL4ooBUevTM/edit?usp=sharing
[RFC7693]
M-J. Saarinen, Ed.; J-P. Aumasson. The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC). November 2015. Informational. URL: https://www.rfc-editor.org/rfc/rfc7693
[RFC9421]
A. Backman, Ed.; J. Richer, Ed.; M. Sporny. HTTP Message Signatures. February 2024. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc9421
[SECURING-WEB]
Mark Nottingham. Securing the Web. TAG Finding. URL: https://www.w3.org/2001/tag/doc/web-https

Issues Index

This might not be necessary. It allows us to explain things like packaging constraints in ways that seem useful, but does introduce some additional complexity in developers' mental model. So, consider it a decision point.
Rather than introducing this header, perhaps we could/should reuse the Identity-Digest proposal [ID.pardue-http-identity-digest], along with the Signature and Signature-Input headers from [RFC9421]. That would avoid reinventing the wheel, and seems pretty reasonable. [Issue #16]
Do we need a mechanism (another header?) allowing the server to specify the public key used to sign the resource? That might allow developers to discover keys for resources more easily, and could be used to reject the resource without validation if we can determine a priori that the keys don’t match...
Would it be useful to extend this header’s behavior to include client-side content validation for hash algorithms? I think it’s arguably outside SRI’s threat model, but you could imagine an attacker that could change content but not headers, which would make enforcement of an Integrity header on the client meaningful for a variety of resources (including top-level documents, which would help provide a web-accessible explanation for some packaging behavior).

That is, a resource delivered with:

Integrity: sha256-[base64’d hash goes here]

Could throw a network error in Fetch if the hash didn’t match the delivered content. Likewise, a resource delivered with:

Integrity: ed25519-[base64’d hash goes here];public-key=[base64’d hash goes here]

Could throw a network error if the delivered signature and public key didn’t validate against the resource’s content.

Or, sites could go crazy and deliver a header containing both:

Integrity: sha256-[base64’d hash goes here],
           ed25519-[base64’d hash goes here];public-key=[base64’d hash goes here]

Which would enforce both constraints.

Not sure it’s a priority, but it might be an interesting primitive to extract from this proposal (especially if we end up adding a streaming hash primitive like [RFC7693] as suggested in issue #104, or its successor, suggested at TPAC in 2024 ).