Transparency via a Machine-readable Server Identity and Purpose Descriptor.
Mike O'Neill, Febuary 2019
©2019, Baycloud Systems Ltd. All rights reserved.
Web pages often contain many, sometimes hundreds, of elements that initiate transactions with servers other than those managed by the top-level website. These "third-party" servers can collect personal data, link it to data from other sources, and the user is usually completely unaware of this.
Unfortunately, there is no recognised standard way for web servers to declare this information i.e. to deliver information that allow users to identify the entities, what their purpose(s) for data collection are (if any), who they share it with, how long they keep it etc.
There is increasing legal pressure around the world for websites to declare their use of data collection procedures, explain how they intend to use the data, or what their legal basis is. In some jurisdictions users have to be offered the right to have their previously collected data deleted, while in others prior consent is needed before data is collected.
In addition user agents have implemented procedures that by default restrict the ability of embedded sub-resources to access cookies. Some of these sub-resources may be managed by the same entity managing the top-level site, or have previously been given explicit consent by the user. A machine-readable mechanism to record this could be useful.
The following is a possible JSON encoding that can deliver the required machine-readable information so that a user agent can make it accessible by the user in an standardised and easily digestible way, and to act on user specified preferences.
The information would be obtained by sending a secure HTTP GET to the resource /.well-known/privacy-declaration
relative to any origin. For example the data declaration for the domain www.bigco.com would be at https://www.bigco.com/.well-known/privacy-declaration/ and return a JSON document with the Content-Type "application/privacy-declaration+json". Alternatively the objects defined here could be incorporated in an Origin Policy manifest "Origin Policy" to minimise the number of round-trips required when accessing a resource.
User agents or script could automatically parse the information as a JavaScript object at a standard location e.g. navigator.privacyDeclaration, which could then be used to display human-readable information to users. First-party sites could ensure this was always available by using an open source JavaScript library, and to support this the privacy-declaration resource should support CORS (Cross-origin resource sharing), so it can be accessed via the appropriate cross-origin fetch or XHR.
JavaScript can examine the JSON encoded for the first-party then use the otherParties and sameParties arrays to fetch the correct privacy-declaration JSON resources from them (made possible because the third-party resources are CORS enabled). The sameParties set of domains could identify sub-resources which can be trusted as "first-party" because they are managed by the entity that manages the top-level site. User agents can check that each origin in a set are referenced by the other origins by their own privacy-declaration resource, i.e. that they all contain exactly the same "sameParties" set. It may be possible for top-level or parent documents to host external privacy-declarations as bundles of "Signed HTTP Exchanges", which would avoid user agents having to make extra round-trips to get them. See @mikewest's proposal for this in "First-Party Sets".
Other methods are possible to ensure that domains are related, for example there could be a link to information in TLS certificate or domain name registrar's whois entry. There is also ongoing discussion about using DNS records to associate relatedness between domain names.
The privacy-declaration resource could be dynamically generated so that some properties could reflect different user agent states derived from the incoming HTTP Request. For example, the server would examine incoming cookies or other headers in order to calculate the correct value of the "consented" property, or the length of time before consent expires.
Conveying User Agent Registered User Consent.
There should be some standardisation on a low-entropy request header signal, which could be an existing header in widespread use like DNT, or a specific cookie name such at the IAB EU's euconsent
. Another avenue could maybe be explored by extending the cookie "prefix" options described in "Cookies: HTTP State Management Mechanism draft-ietf-httpbis-rfc6265bis-02". For example here is a way to encode a consent indication cookie:
Set-Cookie: __Consent-eprivacy=1,5,6; Expires=Sun, 06 Nov 2019 08:49:37 GMT
The value 1,5,6
indicates the set of purposes agreed to by this user, i.e. an index into the PurposeType array.
Using a prefix could allow for recognition and then "special treatment" for low-entropy "consent indication" cookies by user agents. For example User Agents could restrict their scope to the context of a top-level origin, so all or specified embedded origins could receive "site-specific" consent indications. This behaviour would have to be implemented by user agents, but would improve the web by enabling users to give their "site-specific" consent to certain embedded third-party resources, for instance on publishers' sites.
Root properties
Property | Type | Description |
---|---|---|
name | String | Recognisable & unique entity name e.g. "Google Inc." |
policy | String(Uri) | Human readable HTML page explaining the entity’s privacy policy |
storagePolicy | String(Uri) | Human readable HTML page explaining the terminal storage policy |
about | String(Uri) | Human readable HTML page describing the entity |
deleteData | String(Uri) | A HTTP POST will cause all user agent data for this origin to be deleted, e.g. Clear-Site-Data header could be returned |
mayCollect | Boolean | "false" declares that no data is collected, "true" if it may be collected |
mayShare | Boolean | "false" declares no data will be shared with other entities |
mayCombine | Boolean | "false" declares that data is not combined or linked with data from other sources |
purposes | Array of PurposeType Objects | Lists all the purpose for which data is collected |
storage | Array of StorageType Objects | Lists the terminal storage items that may be utilised |
otherParties | Array of Strings | Lists the third-party domains of embedded resources that may appear on this page |
sameParties | Array of Strings | Lists the first-party domains of embedded resources, i.e. those managed by the same entity, that may appear on this page |
The user can give their agreement for zero or more purposes. The purposeType Object for a particular purpose includes a Boolean consented which can be dynamically derived from the incoming HTTP request headers (e.g. cookies).
The storage objects are linked to the specific purposes which they are designed to implement. This gives user agents fine grained ability to restrict storage use to the purposes a user has agreed to.
A browser, browser extension or script executing in the top-level browsing context can use the otherParties and sameParties array to fetch the Descriptors for those domain origins (by fetching the resource at https://{domain name}/.well-known/privacy-declaration.
StorageType Object properties
Property | Type | Description |
---|---|---|
type | String | Storage Type, one of "cookie", "local" (localStorage), "indexed" (indexedDB), "cache" (ETag) |
name | String | Cookie name prefix, localStorage item name, or indexedDB table |
purposeList | Array of Integer | List of ordinal values of entries in the "purposes" array. e.g. [0,1] indicates the first and second purpose type is supported by this Storage Type |
PurposeType Object properties
Property | Type | Description |
---|---|---|
name | String | Short identifying label for this purpose |
description | String | A human readable text clearly describing this purpose in the appropriate language |
maxRetainedFor | Integer | Number of seconds data is retained after collection |
expiresIn | Integer | Number of seconds remaining before collected data is deleted |
consented | Boolean | Dynamic indication of registered user agreement for this purpose |
An example of the encoding.
{
"name": "BigCo Inc",
"policy": "https://www.bigco.com/privacy.html",
"storagePolicy": "https://www.bigco.com/cookie.html",
"about": "https://www.bigco.com/about",
"mayCollect": "true",
"mayShare": "true",
"mayCombine": "false",
"purposes": [
{
"name": "behavioural advertising",
"description": "compiling history of web sites visited",
"maxRetainedFor": "1000000",
"expiresIn": "45667",
"consented": "false"
},
{
"name": "website analytics",
"description": "web audience measurement",
"maxRetainedFor": "10000",
"expiresIn": "3456",
"consented": "false"
},
{
"name": "authentication",
"description": "logging in",
"maxRetainedFor": "1000000",
"expiresIn": "67854",
"consented": "false"
}
],
"storage": [
{
"type": "cookie",
"name": "_ga",
"purposeList": ["0","1"]
},
{
"type": "cookie",
"name" "user",
"purposeList": ["2"]
},
{
"type": "local",
"name" "dataname",
"purposeList": ["0"]
}
],
"otherParties": [
"[www.google.com]",
"[www.google-analytis.com]",
"adnxs.com"
],
"sameParties": [
"ourcdn.com"
]
}
Prior Art
-
Mike West has proposed a way for origins to assert they belong to a set managed by the same top-level or "first party" resource "First-Party Sets"
-
The Tracking Protection Working Group's "Tracking Preference Expression (DNT)" defined a server transparency declaration at
/.well-known/dnt/
This was designed to allow the entity managing any server (first-party or subresource) to declare various properties to aid transparency. -
John Wilander has proposed amendments to the Same Origin Policy so sets of domains could be trusted as if they were first-party. "Single Trust and Same-Origin Policy v2"
-
There is ongoing discussion within the IETF about recognising domain name "relatedness" in DNS records "Related Domains By DNS"
-
Cookie Name Prefixes are discussed in the under-development replacement for RFC 6265 "Cookies: HTTP State Management Mechanism draft-ietf-httpbis-rfc6265bis-02"
-
The IAB EU's "IAB Europe Transparency & Consent Framework" defines an externally hosted JSON resource that identifies Advertising technology vendors and a set of defined purposes.
-
There is ongoing work defining an origin wide server Origin Policy Manifest file at a well-known location. "Origin Policy".