-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated privacy section with discussion on fingerprinting and navigational tracking. #102
base: main
Are you sure you want to change the base?
Conversation
I do not know what "GitHub signature does not match known secret for privacycg/private-click-measurement" means. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'd like @martinthomson's eyes on this too.
|
||
### Combined With Fingerprinting ### {#combined-with-fingerprinting} | ||
|
||
One pre-existing tracking vector is device fingerprinting. The click source and click destination may be able to record fingerprints for all measured events based on e.g. IP address, user agent string, and user-installed fonts. These fingerprints would reduce the number of users who would need a unique ID in event-level attribution reports to be uniquelly identified. Combining with fingerprinting is not a problem inherent to event-level meaurement such as Private Click Measurement, but rather an existing privacy problem on the web. User agents have to defend against device fingerprinting to prohibit this kind of attack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The net effect of a system like this is to create a new cross-site flow of information. I'd prefer that you own that rather than attempting to claim that it isn't your problem.
I don't agree with the characterization (as in "...not a problem inherent...") that event-level attribution doesn't need to consider interactions with fingerprinting. Whether fingerprinting is very coarse (Safari vs. Firefox vs. Chrome) or precise (more or less what we have today), the feature is being introduced to a running system and it needs to consider the effect it has on that system.
In general, when we give new platform features a pass on fingerprinting, it is because we are able to consider the incremental effect of the change. We recognize that fingerprinting exists and is hard to combat, but we can look at the overall increase to the entropy of a feature. Where the value of any fingerprinting increase seems worthwhile, that's where we might adjudge that the increase in fingerprinting is tolerable.
This is further tempered by the fact that many fingerprinting elements are highly correlated (for instance, exposing the existence of a microphone is not usually very much information when a webcam is known to be present as those two things are often available at the same time).
While we don't have a good handle on what the true entropy is, I've heard claims1 that the total entropy of web APIs is still low enough that individuals are not reliably identifiable. It is under this assumption that we accept a partial bit of extra leakage here or there.
What makes PCM hard to reason about here is that it creates a time-based contribution to the amount of information sites can exchange. Over time, information is constantly released. In theory at least, there is no limit to the information sites can exchange about a user if they are patient.
A better approach here might be to attempt to quantify the release of information somehow. Then, this can be combined with what we know (or might assume) about fingerprinting entropy to produce some sort of understanding of what abuse of the API might look like.
nit: s/e.g./such as/ (and s/and/or/) or s/e.g./e.g.,/.
Footnotes
-
This really is only hearsay, so it's impossible to quantify presently. Understanding what browser fingerprinting looks like is a major blind spot here. For instance, I fear that some users are uniquely identifiable, even if most users are not distinguishable from many others. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The net effect of a system like this is to create a new cross-site flow of information. I'd prefer that you own that rather than attempting to claim that it isn't your problem.
I don't understand this feedback. At the beginning of the Privacy section we go through exactly how much data is allowed to be linked cross-site as part of PCM. Do you think that information needs to be reiterated down here, that the two pieces of text are too far part, or that the piece about PCM's own data is unclear?
I don't agree with the characterization (as in "...not a problem inherent...") that event-level attribution doesn't need to consider interactions with fingerprinting. Whether fingerprinting is very coarse (Safari vs. Firefox vs. Chrome) or precise (more or less what we have today), the feature is being introduced to a running system and it needs to consider the effect it has on that system.
I've not tried to state that "event-level attribution doesn't need to consider interactions with fingerprinting." On the contrary, this update to the Privacy section tries to be very open about the fact that a) the combination of event-level attribution and fingerprinting is a problem and that b) user agents need to prevent or mitigate fingerprinting to prevent the combination of the two. Safari's implementation uses IP address hiding in PCM's attribution report requests for instance.
In general, when we give new platform features a pass on fingerprinting, it is because we are able to consider the incremental effect of the change. We recognize that fingerprinting exists and is hard to combat, but we can look at the overall increase to the entropy of a feature. Where the value of any fingerprinting increase seems worthwhile, that's where we might adjudge that the increase in fingerprinting is tolerable.
Are you saying PCM adds fingerprinting attack surface? The definition of fingerprinting I use is based on the fact that fingerprinting is stateless. Stateless in that use of web platform features don't change the fingerprint. By that definition, PCM does not add fingerprinting attack surface since PCM is stateful.
If your view is that PCM adds fingerprinting attack surface, I need to understand how.
For me, the interesting analysis lies in the fact that stateful cross-site data transfer such as event-level click attribution can be combined with stateless, cross-site, partial re-identification through fingerprinting. It's important to reason about it at the event-level and not just for PCM since there is at least one more proposal that's also event-level. The TAG has asked us to coordinate with them so the two features are very similar except in the amount of data on either side of the click.
This is further tempered by the fact that many fingerprinting elements are highly correlated (for instance, exposing the existence of a microphone is not usually very much information when a webcam is known to be present as those two things are often available at the same time).
While we don't have a good handle on what the true entropy is, I've heard claims1 that the total entropy of web APIs is still low enough that individuals are not reliably identifiable. It is under this assumption that we accept a partial bit of extra leakage here or there.
What makes PCM hard to reason about here is that it creates a time-based contribution to the amount of information sites can exchange. Over time, information is constantly released. In theory at least, there is no limit to the information sites can exchange about a user if they are patient.
Getting users to click to navigate cross-site is a limiting factor of course. In fact, that is a key piece of the analysis since otherwise we could support view-through attribution with the same mechanism.
A better approach here might be to attempt to quantify the release of information somehow. Then, this can be combined with what we know (or might assume) about fingerprinting entropy to produce some sort of understanding of what abuse of the API might look like.
You mean over time, under the assumption of repeated clicks and collusion, I assume.
nit: s/e.g./such as/ (and s/and/or/) or s/e.g./e.g.,/.
Footnotes
- This really is only hearsay, so it's impossible to quantify presently. Understanding what browser fingerprinting looks like is a major blind spot here. For instance, I fear that some users are uniquely identifiable, even if most users are not distinguishable from many others. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The net effect of a system like this is to create a new cross-site flow of information. I'd prefer that you own that rather than attempting to claim that it isn't your problem.
I don't understand this feedback. At the beginning of the Privacy section we go through exactly how much data is allowed to be linked cross-site as part of PCM. Do you think that information needs to be reiterated down here, that the two pieces of text are too far part, or that the piece about PCM's own data is unclear?
The way I'm reading the text in this section is that you are saying that fingerprinting and PCM can combine to enable more effective user tracking, but this isn't your problem because fingerprinting is a known issue and browsers should (or will eventually) just block fingerprinting. To be clear, that's a deliberately pejorative reading, but I think that is a possible interpretation of that text. Given that I don't believe fingerprinting will ever go to zero (or that zero fingerprinting is relevant to this analysis), you need more...
Are you saying PCM adds fingerprinting attack surface? The definition of fingerprinting I use is based on the fact that fingerprinting is stateless. Stateless in that use of web platform features don't change the fingerprint. By that definition, PCM does not add fingerprinting attack surface since PCM is stateful.
If your view is that PCM adds fingerprinting attack surface, I need to understand how.
I'm saying that PCM is being deployed into a system where fingerprinting is possible and that in order to understand the resulting privacy properties of the system (and PCM) you need to better understand how the two combine to subvert privacy.
Hypothetical time. Let's say that you need ~30 bits of fingerprinting entropy in order to uniquely identify every web user and the current fingerprinting surface is ~15 bits. If PCM leaks one bit per week, then maybe we might decide that it is OK for a site with repeated interactions over the course of 15 weeks to learn about a person's activity on another site. Or not; we might each reach our own conclusions about that, with different assumptions about fingerprinting.
The point being that there is some amount of time over which the gap between what fingerprinting enables and what a site needs for unique identification. Whatever the gap is (it will change over time, it will change for different sites with different visitor composition), PCM can bridge it eventually. Being able to understand the contribution PCM makes toward identification (or tracking) is a critical part of any analysis of the API.
(Separately, I happen to think that the fact that PCM is capable of bridging any gap to be a real deal-breaker, but my purpose here is in helping you produce the best possible documentation of the system you are proposing.)
For me, the interesting analysis lies in the fact that stateful cross-site data transfer such as event-level click attribution can be combined with stateless, cross-site, partial re-identification through fingerprinting. It's important to reason about it at the event-level and not just for PCM since there is at least one more proposal that's also event-level. The TAG has asked us to coordinate with them so the two features are very similar except in the amount of data on either side of the click.
Excellent. That's exactly what I would expect from the TAG, so 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The net effect of a system like this is to create a new cross-site flow of information. I'd prefer that you own that rather than attempting to claim that it isn't your problem.
I don't understand this feedback. At the beginning of the Privacy section we go through exactly how much data is allowed to be linked cross-site as part of PCM. Do you think that information needs to be reiterated down here, that the two pieces of text are too far part, or that the piece about PCM's own data is unclear?
The way I'm reading the text in this section is that you are saying that fingerprinting and PCM can combine to enable more effective user tracking, but this isn't your problem because fingerprinting is a known issue and browsers should (or will eventually) just block fingerprinting. To be clear, that's a deliberately pejorative reading, but I think that is a possible interpretation of that text. Given that I don't believe fingerprinting will ever go to zero (or that zero fingerprinting is relevant to this analysis), you need more...
I think I see where you're going here. PCM was conceived with two requirements that are highly relevant for this discussion:
- Some data needs to be linked cross-site to get click-through attribution. The data needs to be large enough to allow for useful measurement. We started out with 6 bits on either side but then got strong community feedback suggesting that more bits on the click side were preferable, hence 8+4 bits.
- The feature should not require trusted servers. We didn't want to make servers a requirement for user agents to be standards compliant, we didn't like to have to trust server owners (they all have to be part of the threat model), and we didn't like the financial commitment of setting up or paying for servers to handle web-scale traffic to become a barrier to entry for browser vendors. This requirement is particularly tricky when we deal with IP address hiding so we've not wanted to state that as more than a suggestion for user agents.
Are you saying PCM adds fingerprinting attack surface? The definition of fingerprinting I use is based on the fact that fingerprinting is stateless. Stateless in that use of web platform features don't change the fingerprint. By that definition, PCM does not add fingerprinting attack surface since PCM is stateful.
If your view is that PCM adds fingerprinting attack surface, I need to understand how.I'm saying that PCM is being deployed into a system where fingerprinting is possible and that in order to understand the resulting privacy properties of the system (and PCM) you need to better understand how the two combine to subvert privacy.
Hypothetical time. Let's say that you need ~30 bits of fingerprinting entropy in order to uniquely identify every web user and the current fingerprinting surface is ~15 bits. If PCM leaks one bit per week, then maybe we might decide that it is OK for a site with repeated interactions over the course of 15 weeks to learn about a person's activity on another site. Or not; we might each reach our own conclusions about that, with different assumptions about fingerprinting.
The point being that there is some amount of time over which the gap between what fingerprinting enables and what a site needs for unique identification. Whatever the gap is (it will change over time, it will change for different sites with different visitor composition), PCM can bridge it eventually. Being able to understand the contribution PCM makes toward identification (or tracking) is a critical part of any analysis of the API.
Yeah, some concrete discussion of the highest number of bits allowed in fingerprinting to not get to ≈30 in total would be useful.
The timeline bit revives a proposal we discussed earlier, namely that PCM should rate limit measurements per site pair and browser instance. I think that is very doable.
(Separately, I happen to think that the fact that PCM is capable of bridging any gap to be a real deal-breaker, but my purpose here is in helping you produce the best possible documentation of the system you are proposing.)
For me, the interesting analysis lies in the fact that stateful cross-site data transfer such as event-level click attribution can be combined with stateless, cross-site, partial re-identification through fingerprinting. It's important to reason about it at the event-level and not just for PCM since there is at least one more proposal that's also event-level. The TAG has asked us to coordinate with them so the two features are very similar except in the amount of data on either side of the click.
Excellent. That's exactly what I would expect from the TAG, so 👍.
|
||
### Combined With Navigational Tracking ### {#combined-with-navigational-tracking} | ||
|
||
Another pre-existing tracking vector is [navigational tracking](https://privacycg.github.io/nav-tracking-mitigations/#terminology), sometimes referred to as link decoration tracking. The click source can include a user ID or click ID in the destination URL and the click destination can read and store that ID for attribution purposes. Under the assumption of navigational tracking, there is no need for a feature like Private Click Measurement but it is of course possible to combine navigational tracking with event-level measurement to try to track individual users cross-site. This problem is not inherent to event-level meaurement such as Private Click Measurement, but rather an existing privacy problem on the web. User agents have to defend against navigational tracking to prohibit this kind of attack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on your interpretation of the definition here. Your cited definition is:
Navigational tracking refers to the general use of one or more navigations to identify that a user on one site is the same person as a user on another site.
I don't consider "link decoration tracking" to be an alternative name for navigation tracking. I consider it a subset: just one technique that can be used.
I do agree that if you have reliable navigation tracking that uses just a single navigation (not all techniques do), then you don't need PCM.
In that light, I might make this more pointed:
If sites use link decoration, it is possible that they will be able to use something like link decoration [LINK] to perform attribution. However, user agents that protect against navigation tracking [LINK] might block the use of link decoration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on your interpretation of the definition here. Your cited definition is:
Navigational tracking refers to the general use of one or more navigations to identify that a user on one site is the same person as a user on another site.
I include cross-site tracking of user activity across sites and not just linking of identity across sites. Is that your take too? It's not clear from the above.
I don't consider "link decoration tracking" to be an alternative name for navigation tracking. I consider it a subset: just one technique that can be used.
Fair. I'll update that language.
I do agree that if you have reliable navigation tracking that uses just a single navigation (not all techniques do), then you don't need PCM.
In that light, I might make this more pointed:
If sites use link decoration, it is possible that they will be able to use something like link decoration [LINK] to perform attribution. However, user agents that protect against navigation tracking [LINK] might block the use of link decoration.
Would the [LINK] here be to the WHATWG site or the navigational tracking work item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I include cross-site tracking of user activity across sites and not just linking of identity across sites. Is that your take too? It's not clear from the above.
I agree. I only included this because we're talking specifically about navigation tracking and this is the definition you cited; so it seemed useful context.
Would the [LINK] here be to the WHATWG site or the navigational tracking work item?
[LINK] could be whatever you think is good, though the navigation tracking document defines both terms fairly well, so you could use those.
@@ -241,6 +241,18 @@ In the interest of user privacy, user agents are encouraged to deploy the follow | |||
* The user agent offers users a way to turn Private Click Measurement on and off. | |||
* The user agent doesn't support Private Click Measurement in private/incognito mode. | |||
|
|||
## Event-Level Measurement Combined With Pre-Existing Tracking Vectors ## {#pre-existing-tracking-vectors} | |||
|
|||
Any form of event-level measurement, i.e. where there's one report per event, can be combined with pre-existing tracking vectors in an attempt to track users cross-site. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any form of event-level measurement, i.e. where there's one report per event, can be combined with pre-existing tracking vectors in an attempt to track users cross-site. | |
Any form of event-level measurement, i.e., where there's one report per event, can be combined with pre-existing tracking vectors in an attempt to track users cross-site. |
Fixed. |
No description provided.