-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add candidate TAP for succinct hash bin delegations #120
Add candidate TAP for succinct hash bin delegations #120
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, Marina! Just a few questions...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this nice TAP @mnm678 , reducing metadata overheads on large repositories is very welcome.
Should this require the unnumbered TAP for TUF version management? (aside: I plan to review that over the coming days)
Nit: this tap sometimes says "hash bin" and sometimes "hashed bin", the latter is more common in my experience with the reference implementation. It would be good to be consistent. Note: nowhere in the spec do we use the term hashed bins (or hash bins) – we should probably use this term in the spec so that there is continuity of language.
Nit: Do we have any expectations in the TAPs for line-wrapping? Most of the taps wrap lines around the 80 character point, but this one has some especially long lines (presumably a copy-paste artefact?).
This TAP is not backwards compatible, so it would be better if the version management TAP was accepted first to prevent any compatibility problems. However, this TAP doesn't depend on the changes in the other TAP, and would work with a different version management strategy as well. |
Yes, the current calculations are in a document with a lot of other information, so I will copy those somewhere else and include a link. |
@trishankatdatadog @joshuagl Thanks for the reviews! I believe I addressed all of the comments. Is there anything else before I add a TAP number and merge this as a draft? |
I think we're good. I added one more comment ("MUST" instead of "should", much more prescriptive). |
Co-authored-by: Trishank Karthik Kuppusamy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few suggestions but I think this TAP looks good.
Co-authored-by: Joshua Lock <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent writeup, @mnm678! I strongly support the proposal. Let me know what you think about my comments, especially the one about the maths. :)
candidate-succinct-delegation-tap.md
Outdated
|
||
This TAP supports the following use case: | ||
|
||
Suppose a single targets file contains a very large number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth to say "targets metadata file" here and everywhere to disambiguate it from a target file (w/o "s")?
delegating metadata. | ||
|
||
Similarly, bins are usually named using a numbering system with a | ||
common prefix (i.e. alice.hbd-0, alice.hbd-1, …). The delegating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth to mention what "hbd" stands for? It took me a few seconds to get it ... (but it's also the end of the week). :)
2^BIT_LENGTH-1 that represents the bin number. | ||
|
||
The `succinct_hash_delegations` will be prioritized over | ||
`path_hash_prefixes`. If both of these fields appear in a delegation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an advantage of allowing both fields at the same time? Wouldn't it be better to mandate that only one is specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows for some backwards compatibility as a repository is switching from path_hash_prefixes to succinct_hash_delegations, but it might be simpler to mandate that only one or the other can be used.
candidate-succinct-delegation-tap.md
Outdated
|
||
The `succinct_hash_delegations` will be prioritized over | ||
`path_hash_prefixes`. If both of these fields appear in a delegation, | ||
the `path_hash_prefixes` MUST (as per RFC 2119) be ignored in favor of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this RFC2119 MUST different from all the other musts in the document? And what about the mays and shoulds? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RFC2119 is assumed throughout the specification, so I'll remove that here
candidate-succinct-delegation-tap.md
Outdated
Repositories that use access control for file uploading should take | ||
hashed bin delegations into consideration. Upload access for the name | ||
DELEGATING_ROLENAME.hbd-* should have the same permissions as | ||
DELEGATING_ROLENAME. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, why is this important now, but was not important for path_hash_prefixes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There currently isn't much discussion about access control in the specification, but I thought this was worth mentioning in this case.
candidate-succinct-delegation-tap.md
Outdated
resolve these using prioritized delegations. So if a delegation uses | ||
succinct hashed bins and another role delegates to | ||
DELEGATING_ROLENAME.hbd-\*, the client will use the delegation with a | ||
higher priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this paragraph necessary here? Prioritized delegations are described elsewhere, right? And the new field in this TAP doesn't change anything about it.
candidate-succinct-delegation-tap.md
Outdated
the first BIT_LENGTH bits of the hash and will be determined for each | ||
bin i by computing i * BIT_LENGTH. Once computed, the | ||
path_hash_prefixes must be represented as the first BIT_LENGTH bits of | ||
the hash represented in base16, followed by a “*”. So for a BIT_LENGTH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor nit: There's a mix of “
and "
in this document. I think we want to stick to the latter.
candidate-succinct-delegation-tap.md
Outdated
the first BIT_LENGTH bits of the hash and will be determined for each | ||
bin i by computing i * BIT_LENGTH. Once computed, the | ||
path_hash_prefixes must be represented as the first BIT_LENGTH bits of | ||
the hash represented in base16, followed by a “*”. So for a BIT_LENGTH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to escape asterisks here and elsewhere in the free text, or you mark it up as verbatim text.
candidate-succinct-delegation-tap.md
Outdated
the hash represented in base16, followed by a “*”. So for a BIT_LENGTH | ||
of 3, the path_hash_prefix of the first bin would be “0*” (binary | ||
values starting with 000) , the second bin would be “2*” (binary | ||
values 001), the third “4*”, then “6*”, “8*”, “a*”, “c*”, and “e*”. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the bin partitioning maths here. Are you sure that this matches what we currently do? Would you mind elaborating how BIT_LENGTH
relates to what we currently call prefix_length
?
candidate-succinct-delegation-tap.md
Outdated
of 3, the path_hash_prefix of the first bin would be “0*” (binary | ||
values starting with 000) , the second bin would be “2*” (binary | ||
values 001), the third “4*”, then “6*”, “8*”, “a*”, “c*”, and “e*”. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I just re-read the paragraph and I understand it now. :) The math seems to match what's in the reference implementation, but the binary value in the parentheses for bin "2*" should be 010, right?
Also, I was confused by the word starting, do you mean the binary representation of the hash starts with 000, or the range of hash path prefixes for bin 0* starts with 000? Either way, I think it would be helpful to list all path prefixes for every bin in this example, it's only 8 bins and 2 prefixes per bins.
And I have another question, why (and where) do we need to represent the hash_path_prefixes
as the first BIT_LENGTH bits of the hash represented in base16, followed by a “*”? Isn't the point about succinct_hash_delegations
that we no longer need the path_hash_prefixes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this, but I think the relationship between the binary prefixes and their base64 representations might be a confusing way to explain this (especially because, as you mention, we no longer need to use path_hash_prefixes). I'll revisit this in a bit to see if I can clarify it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this again and removed all the mentions of path_hash_prefixes
in this paragraph.
``` | ||
("succinct_hash_delegations" : { | ||
"prefix_bit_length" : BIT_LENGTH | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the POC implementation at theupdateframework/python-tuf#1106, having a dictionary with a single item is a bit wordy. Would it make more sense to instead add something like:
("generate_succinct_delegations" : BIT_LENGTH)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generate_succinct_delegations
sounds a bit like a boolean to me. What about "delegation_hash_prefix_len": BIT_LENGTH
? We could also keep the dictionary and add an additional optional role_name_prefix
to allow the delegating role to override the default DELEGATING_ROLENAME.hbd-
prefix for delegated bin roles. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of being able to specify role_name_prefix
! Rather than have it be a field in the succinct_hash_delegation
dictionary, should it be part of the DELEGATIONS
object so that it can also be used for path_hash_prefixes
and give us consistently named hashed bin delegations, regardless of which mechanism they were created with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that specifying the role_name_prefix
would be helpful to give the delegating metadata more flexibility. With the existing mechanism, it is already possible to use any role name for hashed bins (although the api does not provide this flexibility). Maybe as part of the implementation we can add the ability to define role_name_prefix
for both succinct hashed bin delegations (by adding it to the metadata) and for legacy hashed bin delegations (by adding it to the api).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, a role_name_prefix
in the delegations object is not needed for normal/legacy hashed bin delegations, because each delegated hash bin role name is listed explicitly in the delegation object. But it would definitely be great to support a prefix parameter in the API for any type of hashed bin delegation (see theupdateframework/python-tuf#1106 (comment))
What do you think about using the name
field to specify the prefix? It is already re-purposed, by assigning the delegating role name in the case of succinct hbd, and if we specify a prefix we don't need that anymore, right? OTOH, it might be better to aim for semantic correctness and not call it name if it is only a name prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, I like the idea of re-purposing the name
field. This field doesn't have any other use for succinct hashed bin delegations (as the name is generated from the prefix), so it won't interfere with any other functionality.
If there are no objections, I will update this pr and theupdateframework/python-tuf#1106 to use delegation_hash_prefix_len
and describe the use of custom role name prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been a while, are we still planning to repurpose the name
field as the prefix and drop the bin_name_prefix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, we decided against that to make the field names more clear, but I'm still open to using the name field.
… prefixes. This commit replaces the wordy notation: ("succinct_hash_delegations" : { "prefix_bit_length" : BIT_LENGTH }) with the simpler: ("delegation_hash_prefix_len" : BIT_LENGTH) In addition, it allows the user to specify a prefix for succinct hashed bin delegations instead of always using the default DELEGATING_ROLENAME.hbd- prefix. Signed-off-by: marinamoore <[email protected]>
candidate-succinct-delegation-tap.md
Outdated
name of the delegating entity. It will be structured as | ||
DELEGATING_ROLENAME.hbd-COUNT where DELEGATING_ROLENAME is the name of | ||
the role that delegated to the hashed bins and COUNT is a value between 0 and | ||
PREFIX listed in the `name` field of the delegation. By default, the PREFIX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default, when the name field is empty? or something else? If the former, does that make the name field optional?
candidate-succinct-delegation-tap.md
Outdated
("succinct_hash_delegations" : { | ||
"prefix_bit_length" : BIT_LENGTH | ||
}) | ||
("delegation_hash_prefix_len" : BIT_LENGTH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mildly concerned that path_hash_prefixes
and delegation_hash_prefix_len
are not obviously related and exclusive of each other. We can address this somewhat by improving wording in the spec, but ideally the names would be more indicative of their relation.
Maybe this is something we should file an issue about to address in a major version update once TAP 14 is Final?
The `NAME_PREFIX` field more explicitly indicates the prefix that should be used when generating bin names for succinct hashed bin delegations. In order to include this field, this pr also creates a dictionary to include all fields associated with succinct hashed bin delegations. Signed-off-by: marinamoore <[email protected]>
…ions The name field is overwritten by the generated names, so to remove confusion, this field should not be used with succinct hashed bin delegations, Signed-off-by: marinamoore <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the format of the delegations object could be cleaned up, but I'm not sure my proposals here make it any clearer.
If I'm following correctly a delegations object needs either name
or succinct_hash_delegations
and path_hash_prefixes
is optional. If the delegations object includes a succinct_hash_delegations
field then any path_hash_prefixes
field is ignored.
Is that correct? Can we rewrite the object format for the specification to make this clearer?
KEYID : KEY, | ||
... }, | ||
"roles" : [{ | ||
("name": ROLENAME,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this effectively becomes
("name": ROLENAME,) | |
("name": ROLENAME | | |
"succinct_hash_delegations" : { | |
"delegation_hash_prefix_len" : BIT_LENGTH, | |
"bin_name_prefix" : NAME_PREFIX}), |
("path_hash_prefixes" : [ HEX_DIGEST, ... ] | | ||
("succinct_hash_delegations" : { | ||
"delegation_hash_prefix_len" : BIT_LENGTH, | ||
"bin_name_prefix" : NAME_PREFIX | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this would become:
("path_hash_prefixes" : [ HEX_DIGEST, ... ] | | |
("succinct_hash_delegations" : { | |
"delegation_hash_prefix_len" : BIT_LENGTH, | |
"bin_name_prefix" : NAME_PREFIX | |
}) | |
("path_hash_prefixes" : [ HEX_DIGEST, ... ],) |
Yes, and I agree that we should make this more clear in the object format. However, it'd be nice if we could still relate that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not looked at the implementation yet but this looks very useful 👍
1,600,000 bytes for each target. Using succinct hashed bin delegations, | ||
the snapshot and targets metadata overhead for a target can be reduced | ||
to about 550,000 bytes. For more detail about how these overheads were | ||
calculated, see [this spreadsheet](https://docs.google.com/spreadsheets/d/10AKDsHsM2mmh45CWCNFxihJ9f-SP6gXYv7WcWpt-fDQ/edit#gid=0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a link to google docs is appropriate here? I guess this is tricky to express otherwise and link is better than nothing...
Could you make these items in the spreadsheet clearer: What do "worst case path bytes" and "target filename" refer to specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to other ways to present this. I thought it was too many calculations for a markdown table.
I also added some comments to the spreadsheet to clarify those terms.
This commit removes the 'paths' field when 'path_hash_prefixes' are used and adds an example of delegations metadata using the succinct hashed bin delegation format. It also clarifies the the 'COUNT' term refers to a hexadecimal representation. Signed-off-by: marinamoore <[email protected]>
Looking a this with fresh eyes, I think it's more important in the object definition to tie |
@joshuagl @JustinCappos @trishankatdatadog @lukpueh Can we merge this TAP as a draft? I think the only open question is the use of |
Yes, let's get this out ASAP for PyPA, maybe a nice Christmas gift! |
Agreed, would be great to get this TAP implemented in the reference implementation for PyPI to use. |
No description provided.