Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add candidate TAP for succinct hash bin delegations #120

Merged

Conversation

mnm678
Copy link
Contributor

@mnm678 mnm678 commented Jul 2, 2020

No description provided.

Copy link
Member

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, Marina! Just a few questions...

candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Show resolved Hide resolved
Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this nice TAP @mnm678 , reducing metadata overheads on large repositories is very welcome.

Should this require the unnumbered TAP for TUF version management? (aside: I plan to review that over the coming days)

Nit: this tap sometimes says "hash bin" and sometimes "hashed bin", the latter is more common in my experience with the reference implementation. It would be good to be consistent. Note: nowhere in the spec do we use the term hashed bins (or hash bins) – we should probably use this term in the spec so that there is continuity of language.

Nit: Do we have any expectations in the TAPs for line-wrapping? Most of the taps wrap lines around the 80 character point, but this one has some especially long lines (presumably a copy-paste artefact?).

candidate-succinct-delegation-tap.md Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
@mnm678
Copy link
Contributor Author

mnm678 commented Jul 6, 2020

Should this require the unnumbered TAP for TUF version management? (aside: I plan to review that over the coming days)

This TAP is not backwards compatible, so it would be better if the version management TAP was accepted first to prevent any compatibility problems. However, this TAP doesn't depend on the changes in the other TAP, and would work with a different version management strategy as well.

@mnm678
Copy link
Contributor Author

mnm678 commented Jul 6, 2020

Would you please briefly describe how these numbers were calculated for the unfamiliar reader? Thanks!

Yes, the current calculations are in a document with a lot of other information, so I will copy those somewhere else and include a link.

@mnm678
Copy link
Contributor Author

mnm678 commented Jul 7, 2020

@trishankatdatadog @joshuagl Thanks for the reviews! I believe I addressed all of the comments. Is there anything else before I add a TAP number and merge this as a draft?

@trishankatdatadog
Copy link
Member

@trishankatdatadog @joshuagl Thanks for the reviews! I believe I addressed all of the comments. Is there anything else before I add a TAP number and merge this as a draft?

I think we're good. I added one more comment ("MUST" instead of "should", much more prescriptive).

Co-authored-by: Trishank Karthik Kuppusamy <[email protected]>
Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few suggestions but I think this TAP looks good.

candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent writeup, @mnm678! I strongly support the proposal. Let me know what you think about my comments, especially the one about the maths. :)


This TAP supports the following use case:

Suppose a single targets file contains a very large number of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth to say "targets metadata file" here and everywhere to disambiguate it from a target file (w/o "s")?

delegating metadata.

Similarly, bins are usually named using a numbering system with a
common prefix (i.e. alice.hbd-0, alice.hbd-1, …). The delegating
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth to mention what "hbd" stands for? It took me a few seconds to get it ... (but it's also the end of the week). :)

candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
2^BIT_LENGTH-1 that represents the bin number.

The `succinct_hash_delegations` will be prioritized over
`path_hash_prefixes`. If both of these fields appear in a delegation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage of allowing both fields at the same time? Wouldn't it be better to mandate that only one is specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows for some backwards compatibility as a repository is switching from path_hash_prefixes to succinct_hash_delegations, but it might be simpler to mandate that only one or the other can be used.


The `succinct_hash_delegations` will be prioritized over
`path_hash_prefixes`. If both of these fields appear in a delegation,
the `path_hash_prefixes` MUST (as per RFC 2119) be ignored in favor of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this RFC2119 MUST different from all the other musts in the document? And what about the mays and shoulds? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RFC2119 is assumed throughout the specification, so I'll remove that here

Repositories that use access control for file uploading should take
hashed bin delegations into consideration. Upload access for the name
DELEGATING_ROLENAME.hbd-* should have the same permissions as
DELEGATING_ROLENAME.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, why is this important now, but was not important for path_hash_prefixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There currently isn't much discussion about access control in the specification, but I thought this was worth mentioning in this case.

resolve these using prioritized delegations. So if a delegation uses
succinct hashed bins and another role delegates to
DELEGATING_ROLENAME.hbd-\*, the client will use the delegation with a
higher priority.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this paragraph necessary here? Prioritized delegations are described elsewhere, right? And the new field in this TAP doesn't change anything about it.

the first BIT_LENGTH bits of the hash and will be determined for each
bin i by computing i * BIT_LENGTH. Once computed, the
path_hash_prefixes must be represented as the first BIT_LENGTH bits of
the hash represented in base16, followed by a “*”. So for a BIT_LENGTH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor nit: There's a mix of and " in this document. I think we want to stick to the latter.

the first BIT_LENGTH bits of the hash and will be determined for each
bin i by computing i * BIT_LENGTH. Once computed, the
path_hash_prefixes must be represented as the first BIT_LENGTH bits of
the hash represented in base16, followed by a “*”. So for a BIT_LENGTH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to escape asterisks here and elsewhere in the free text, or you mark it up as verbatim text.

the hash represented in base16, followed by a “*”. So for a BIT_LENGTH
of 3, the path_hash_prefix of the first bin would be “0*” (binary
values starting with 000) , the second bin would be “2*” (binary
values 001), the third “4*”, then “6*”, “8*”, “a*”, “c*”, and “e*”.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the bin partitioning maths here. Are you sure that this matches what we currently do? Would you mind elaborating how BIT_LENGTH relates to what we currently call prefix_length?

Comment on lines 152 to 154
of 3, the path_hash_prefix of the first bin would be “0*” (binary
values starting with 000) , the second bin would be “2*” (binary
values 001), the third “4*”, then “6*”, “8*”, “a*”, “c*”, and “e*”.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I just re-read the paragraph and I understand it now. :) The math seems to match what's in the reference implementation, but the binary value in the parentheses for bin "2*" should be 010, right?

Also, I was confused by the word starting, do you mean the binary representation of the hash starts with 000, or the range of hash path prefixes for bin 0* starts with 000? Either way, I think it would be helpful to list all path prefixes for every bin in this example, it's only 8 bins and 2 prefixes per bins.

And I have another question, why (and where) do we need to represent the hash_path_prefixes as the first BIT_LENGTH bits of the hash represented in base16, followed by a “*”? Isn't the point about succinct_hash_delegations that we no longer need the path_hash_prefixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this, but I think the relationship between the binary prefixes and their base64 representations might be a confusing way to explain this (especially because, as you mention, we no longer need to use path_hash_prefixes). I'll revisit this in a bit to see if I can clarify it better.

Copy link
Contributor Author

@mnm678 mnm678 Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this again and removed all the mentions of path_hash_prefixes in this paragraph.

```
("succinct_hash_delegations" : {
"prefix_bit_length" : BIT_LENGTH
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the POC implementation at theupdateframework/python-tuf#1106, having a dictionary with a single item is a bit wordy. Would it make more sense to instead add something like:

("generate_succinct_delegations" : BIT_LENGTH)

@lukpueh @JustinCappos @joshuagl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_succinct_delegations sounds a bit like a boolean to me. What about "delegation_hash_prefix_len": BIT_LENGTH? We could also keep the dictionary and add an additional optional role_name_prefix to allow the delegating role to override the default DELEGATING_ROLENAME.hbd- prefix for delegated bin roles. What do you think?

Copy link
Member

@joshuagl joshuagl Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of being able to specify role_name_prefix! Rather than have it be a field in the succinct_hash_delegation dictionary, should it be part of the DELEGATIONS object so that it can also be used for path_hash_prefixes and give us consistently named hashed bin delegations, regardless of which mechanism they were created with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that specifying the role_name_prefix would be helpful to give the delegating metadata more flexibility. With the existing mechanism, it is already possible to use any role name for hashed bins (although the api does not provide this flexibility). Maybe as part of the implementation we can add the ability to define role_name_prefix for both succinct hashed bin delegations (by adding it to the metadata) and for legacy hashed bin delegations (by adding it to the api).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, a role_name_prefix in the delegations object is not needed for normal/legacy hashed bin delegations, because each delegated hash bin role name is listed explicitly in the delegation object. But it would definitely be great to support a prefix parameter in the API for any type of hashed bin delegation (see theupdateframework/python-tuf#1106 (comment))

What do you think about using the name field to specify the prefix? It is already re-purposed, by assigning the delegating role name in the case of succinct hbd, and if we specify a prefix we don't need that anymore, right? OTOH, it might be better to aim for semantic correctness and not call it name if it is only a name prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, I like the idea of re-purposing the name field. This field doesn't have any other use for succinct hashed bin delegations (as the name is generated from the prefix), so it won't interfere with any other functionality.

If there are no objections, I will update this pr and theupdateframework/python-tuf#1106 to use delegation_hash_prefix_len and describe the use of custom role name prefixes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been a while, are we still planning to repurpose the name field as the prefix and drop the bin_name_prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, we decided against that to make the field names more clear, but I'm still open to using the name field.

… prefixes.

This commit replaces the wordy notation:

("succinct_hash_delegations" : {
    "prefix_bit_length" : BIT_LENGTH
})

with the simpler:

("delegation_hash_prefix_len" : BIT_LENGTH)

In addition, it allows the user to specify a prefix for succinct
hashed bin delegations instead of always using the default
DELEGATING_ROLENAME.hbd- prefix.

Signed-off-by: marinamoore <[email protected]>
name of the delegating entity. It will be structured as
DELEGATING_ROLENAME.hbd-COUNT where DELEGATING_ROLENAME is the name of
the role that delegated to the hashed bins and COUNT is a value between 0 and
PREFIX listed in the `name` field of the delegation. By default, the PREFIX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default, when the name field is empty? or something else? If the former, does that make the name field optional?

("succinct_hash_delegations" : {
"prefix_bit_length" : BIT_LENGTH
})
("delegation_hash_prefix_len" : BIT_LENGTH)
Copy link
Member

@joshuagl joshuagl Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly concerned that path_hash_prefixes and delegation_hash_prefix_len are not obviously related and exclusive of each other. We can address this somewhat by improving wording in the spec, but ideally the names would be more indicative of their relation.

Maybe this is something we should file an issue about to address in a major version update once TAP 14 is Final?

The `NAME_PREFIX` field more explicitly indicates the prefix
that should be used when generating bin names for succinct
hashed bin delegations. In order to include this field, this
pr also creates a dictionary to include all fields associated
with succinct hashed bin delegations.

Signed-off-by: marinamoore <[email protected]>
@mnm678
Copy link
Contributor Author

mnm678 commented Oct 1, 2020

@lukpueh @joshuagl Per our discussion today, I updated the TAP to include a name_prefix field to more explicitly describe the generated bins.

…ions

The name field is overwritten by the generated names, so to
remove confusion, this field should not be used with succinct hashed
bin delegations,

Signed-off-by: marinamoore <[email protected]>
Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the format of the delegations object could be cleaned up, but I'm not sure my proposals here make it any clearer.

If I'm following correctly a delegations object needs either name or succinct_hash_delegations and path_hash_prefixes is optional. If the delegations object includes a succinct_hash_delegations field then any path_hash_prefixes field is ignored.

Is that correct? Can we rewrite the object format for the specification to make this clearer?

KEYID : KEY,
... },
"roles" : [{
("name": ROLENAME,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this effectively becomes

Suggested change
("name": ROLENAME,)
("name": ROLENAME |
"succinct_hash_delegations" : {
"delegation_hash_prefix_len" : BIT_LENGTH,
"bin_name_prefix" : NAME_PREFIX}),

Comment on lines +184 to +188
("path_hash_prefixes" : [ HEX_DIGEST, ... ] |
("succinct_hash_delegations" : {
"delegation_hash_prefix_len" : BIT_LENGTH,
"bin_name_prefix" : NAME_PREFIX
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this would become:

Suggested change
("path_hash_prefixes" : [ HEX_DIGEST, ... ] |
("succinct_hash_delegations" : {
"delegation_hash_prefix_len" : BIT_LENGTH,
"bin_name_prefix" : NAME_PREFIX
})
("path_hash_prefixes" : [ HEX_DIGEST, ... ],)

@mnm678
Copy link
Contributor Author

mnm678 commented Oct 5, 2020

If I'm following correctly a delegations object needs either name or succinct_hash_delegations and path_hash_prefixes is optional. If the delegations object includes a succinct_hash_delegations field then any path_hash_prefixes field is ignored.

Is that correct? Can we rewrite the object format for the specification to make this clearer?

Yes, and I agree that we should make this more clear in the object format. However, it'd be nice if we could still relate that succinct_hash_delegations supersedes path_hash_prefixes. I'll think about this some more and see if there's a clear way to relate all of that.

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked at the implementation yet but this looks very useful 👍

candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
1,600,000 bytes for each target. Using succinct hashed bin delegations,
the snapshot and targets metadata overhead for a target can be reduced
to about 550,000 bytes. For more detail about how these overheads were
calculated, see [this spreadsheet](https://docs.google.com/spreadsheets/d/10AKDsHsM2mmh45CWCNFxihJ9f-SP6gXYv7WcWpt-fDQ/edit#gid=0).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a link to google docs is appropriate here? I guess this is tricky to express otherwise and link is better than nothing...

Could you make these items in the spreadsheet clearer: What do "worst case path bytes" and "target filename" refer to specifically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to other ways to present this. I thought it was too many calculations for a markdown table.

I also added some comments to the spreadsheet to clarify those terms.

candidate-succinct-delegation-tap.md Outdated Show resolved Hide resolved
This commit removes the 'paths' field when 'path_hash_prefixes'
are used and adds an example of delegations metadata using the
succinct hashed bin delegation format.

It also clarifies the the 'COUNT' term refers to a hexadecimal
representation.

Signed-off-by: marinamoore <[email protected]>
@joshuagl
Copy link
Member

joshuagl commented Nov 3, 2020

If I'm following correctly a delegations object needs either name or succinct_hash_delegations and path_hash_prefixes is optional. If the delegations object includes a succinct_hash_delegations field then any path_hash_prefixes field is ignored.
Is that correct? Can we rewrite the object format for the specification to make this clearer?

Yes, and I agree that we should make this more clear in the object format. However, it'd be nice if we could still relate that succinct_hash_delegations supersedes path_hash_prefixes. I'll think about this some more and see if there's a clear way to relate all of that.

Looking a this with fresh eyes, I think it's more important in the object definition to tie succinct_hash_delegations and path_hash_prefixes together. I've been unable to come up with a better alternative. I think the object format you originally proposed is the most sensible.

@mnm678
Copy link
Contributor Author

mnm678 commented Dec 15, 2020

@joshuagl @JustinCappos @trishankatdatadog @lukpueh Can we merge this TAP as a draft? I think the only open question is the use of name vs bin_name_prefix, and that will probably be resolved through some experimenting on the PoC.

@trishankatdatadog
Copy link
Member

@joshuagl @JustinCappos @trishankatdatadog @lukpueh Can we merge this TAP as a draft? I think the only open question is the use of name vs bin_name_prefix, and that will probably be resolved through some experimenting on the PoC.

Yes, let's get this out ASAP for PyPA, maybe a nice Christmas gift!

@trishankatdatadog trishankatdatadog merged commit fd5671b into theupdateframework:master Dec 16, 2020
@joshuagl
Copy link
Member

Agreed, would be great to get this TAP implemented in the reference implementation for PyPI to use.
I just filed #132 to track outstanding items for this TAP.

@mnm678 mnm678 deleted the succinct-delegations branch March 1, 2023 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants