这是indexloc提供的服务,不要输入任何密码
Skip to content

Browser ID in k-anonymity submissions #1000

@martinthomson

Description

@martinthomson

The explainer talks about having a low entropy identifier for each browser. This appears to be so that Join requests from the same browser can be linked. That is, a browser can refresh Join they previously made without contributing toward the threshold.

This might also reduce the total state exposure of the k-anonymity server. If $j$ bits of data are used, the server only needs to keep $2^j$ items for each hash. That's still an awful lot, but there is now a cap.

Concentrating on the first one, I think that this is better addressed by having the client simply indicate when it is refreshes an existing item. The client could provide an approximate (noised) estimate of when the last entry was added or how long since the item was last added. My sense is that you could tolerate a lot of noise on this estimate and still have a functioning system that doesn't have a persistent identifier client involved, even a low entropy one. The time granularity only needs to be the period ($p$).

The state limiting thing is not really that big of a deal. A server can maintain a count for each period in the tracked window ($w$). Then the server only has to maintain $\frac{w}{p}=720$ numbers for each, which is plumb between $j=9$ and $j=10$ if you are just counting state1. Refreshes just mean moving one count from a previous period to the latest, as opposed to just adding to the latest.

There is some clever rate-limiting token stuff that can be used to prevent a client from refreshing too often, if you really want to go there. You could also issue unlinkable cookies that are bound to a specific period. I don't think that you want to do any of that, but I'm mentioning it just in case.

Footnotes

  1. If we're talking about state, you don't need to keep 720 values if you have far more than the threshold in recent buckets, so more popular hashes can be even cheaper to track2.

  2. I don't know what that does for timing side channel leaks, but if the hash really is popular, it shouldn't matter because it will return an over-threshold result. I'd be more concerned about leaks distinguishing zero from one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions