Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TAP 15 #1948

Closed
wants to merge 14 commits into from
Closed

Conversation

MVrachev
Copy link
Collaborator

@MVrachev MVrachev commented Apr 8, 2022

Related to issue: #1909

Description of the changes being introduced by the pull request:

TAP 15 was created on June 23-rd 2020 and was last modified in July 6-th 2020.
Since then the TAP has been put to draft status meaning it needs a prototype implementation before it's accepted as a future specification change.
Given that this TAP underlines an efficient way of handling hash bin delegations, meaning it would be really useful when python-tuf is integrated into Warehouse, it's logical that we should not only create a prototype but directly work to integrate it in python-tuf.

The outcome of the TAP 15 implementation is

  1. Support reading, updating, and saving target metadata files that are using succinct hashed bin delegations.
  2. Support in ngclient for calculating the delegated role name on a given target path.
  3. An exemplary document inside examples/ showcasing how succinct hashed bin delegations can be utilized in practice.

This pr covers points 1 and 2.
Point 3 will be done in another pr.

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@MVrachev
Copy link
Collaborator Author

MVrachev commented Apr 8, 2022

This pr is a draft because of warnings given by mypy.
Those warnings are connected with the issue discussed in #1943.

I am not sure what is the problem for python3.7 https://github.com/theupdateframework/python-tuf/runs/5888251078?check_suite_focus=true

@coveralls
Copy link

coveralls commented Apr 8, 2022

Pull Request Test Coverage Report for Build 2271063967

  • 86 of 88 (97.73%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.1%) to 98.238%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tuf/api/metadata.py 82 84 97.62%
Totals Coverage Status
Change from base Build 2268232200: -0.1%
Covered Lines: 1273
Relevant Lines: 1291

💛 - Coveralls

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

  • I left a complaint about using dict to "hide" API, that one I fee strongly about, let's figure out a solution for it.
  • changing API in incompatible ways (like arg order in DelegatedRole.__init__()) may be inevitable, but let's at least try to avoid it
  • other comments are not strictly change requests.

Is the current implementation using None as the key in Delegations.roles dictionary? I'm quite surprised mypy does not complain about it in Delegations.from_dict() ...

tuf/api/metadata.py Outdated Show resolved Hide resolved
paths: Optional[List[str]] = None,
path_hash_prefixes: Optional[List[str]] = None,
succinct_hash_info: Optional[Dict[str, Any]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for using a different name than the TAP does?

Copy link
Collaborator Author

@MVrachev MVrachev Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it boils down to that it's just shorter.
succinct_hash_delegations is 25 characters and because of our characters per line restriction, I had to often split lines.
succinct_hash_info is 18 characters which save a good amount of characters while still hinting at the purpose of this argument.

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/ngclient/updater.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
@MVrachev
Copy link
Collaborator Author

I have made multiple changes:

  1. Made DelegatedRole.name as the first attribute in the initialization list (the same way as it's now in the develop branch) and as thus removing the incompatible change
  2. Renamed DelegatedRole.get_rolename to Delegated.find_delegations and used find_delegations replacing the need for DelegatedRole.is_delegated_path.
  3. Simplified checks with sum suggestions by @jku
  4. Experimented with creating a new container helper class HashDelegations containing the information about succinct hash delegations. This allowed me to make sure the requirements from TAP 15 are met inside the class without making DelegatedRole even more complicated and to add support for unrecognized fields inside the succinct_hash_delegations dict.

@jku it's ready for another review.

@MVrachev MVrachev requested a review from jku April 12, 2022 14:45
@MVrachev MVrachev force-pushed the tap15 branch 3 times, most recently from 1bf36b5 to fac307e Compare April 12, 2022 14:53
Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think the direction in this version can work. It's a new class just to hold two variables: this feels like overkill at first but there's some specific problems (the bin math plus the handling unrecognized fields in the new dict) that we can contain within that class so I think it works out.

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
@@ -1384,6 +1465,58 @@ def _is_target_in_pathpattern(targetpath: str, pathpattern: str) -> bool:

return True

def _calculate_delegation(self, hash_bits_representation: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method now seems like it should be part of the new class (HashDelegations.find_delegation() maybe)?

I also think the argument to the function should be targetpath : this way the arguments and output make sense to a reader: you feed in the targetpath and get the real delegated role name as output -- The fact that this specific implementation uses a string of ones and zeros as an intermediate value is not important and you only need to move a few lines of code to get this better abstraction.

Then DelegatedRole.find_delegation() ends up being quite clean, something like

if self.succinct_hash_info:
    return self.succinct_hash_info.find_delegation(target_filepath)
elif self.is_delegated_path(target_filepath):
    return self.name
return None

Opinions?

Copy link
Member

@jku jku Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this a little more complex: we don't have an API for someone who

  • uses non-succinct hash delegations
  • wants to find out which bin might provide (or shoudl provide) a targetpath

This API clearly needs to be above the DelegatedRole level -- at Delegations maybe, but internally could use some of the same code

This question is not really TAP 15 related... but makes sense to solve (or at least discuss) at same time (and is related to #1952 and warehouse work)

Copy link
Collaborator Author

@MVrachev MVrachev Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I don't pass targetpath as an argument is the testing.
I want to test the actual bin calculation math and if I pass the targetpath this means that the hash calculation has to be moved inside _calculate_delegation and I won't be able to pass the binary representation in a string as do now.
That way the test inside test_api.py won't be clear for the reader if we do that.
Given that this is an internal API I think it's allowed to require a little more specific argument for easier testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think I'll leave this to you and @lukpueh as I think you've been talking about this part of the code.

Personally I'm not fond of making production code harder to read based on unit testing needs, or the basic idea of building a string of ones and zeros just to turn it into an int later ... but I can live with those so I'll leave these for you

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
Comment on lines 1359 to 1360
succinct_hash_info: Optional[Dict[str, Any]] = None,
unrecognized_fields: Optional[Mapping[str, Any]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictly speaking this is still breaking API.
DelegatedRole("a", [], 1, False, [], None, {})
used to be valid code. Now it will fail I assume.

New arguments should generally speaking be added in the end of the list... I'm not saying we can't break API here: it's not a major breakage by any means but maybe I'd still vote for adding the new arg in the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add it to the end as in my mind unrecognized_fields are at the end of each class, but maybe that's not so important.

@MVrachev
Copy link
Collaborator Author

@jku I updated the pr once again making the following changes (not in order):

  • renaming HashDelegations to SuccinctHashDelegations - please leave a comment what do you think
  • left a deprecation messaged for is_delegated_path and made is_delegated_path work when succinct_hash_delegations is used
  • removed a complex comment inside from_delegations
  • moved the helper function _calculate_delegation inside SuccinctHashDelegations
  • added additional comments about the exclusive arguments in the DelegatedRole docstring

@MVrachev
Copy link
Collaborator Author

Made an update to one of the commits to remove the f string feature that caused problems for all python3.7 CI runs.

@MVrachev MVrachev requested a review from jku April 15, 2022 11:23
Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • SuccinctHashDelegations works better as a name I think: it's long but not used v commonly.
  • moving _calculate_delegation() without modifying it otherwise probably does not make sense: having to disable pylints is a good hint that something is not right. my opinion is still that it would make sense for SuccinctHashDelegations to expose an actual delegation lookup method that accepts target path as argument but I'll just leave that as opinion
  • I think it mostly looks good: our naming is now not perfect as DelegatedRole can now represent multiple roles and as result we need DelegatedRole.find_delegation() which really does not make sense as a name ... but it seems best we can do without breaking API: I'm fine with that solution

@@ -1258,17 +1259,99 @@ def to_dict(self) -> Dict[str, Any]:
return snapshot_dict


class SuccinctHashDelegations:
"""A container with information about a succinct hash delegation.
Copy link
Member

@jku jku Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still needs an explanation: "succinct hash delegation" is probably word salad for anyone except about 20 people on the planet.

It does not need to explain every detail but the core idea: that it desribes delegations of all possible target paths to variable number of bins, the names of which are generated based on the two fields.

tests/test_metadata_eq_.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
@MVrachev
Copy link
Collaborator Author

MVrachev commented Apr 20, 2022

I addressed all of the @jku comments.
I found a way to add a good looking warning by using the sphinx format: https://sublime-and-sphinx-guide.readthedocs.io/en/latest/notes_warnings.html#warnings which looks like this:
image
Additionally, I added the proposed solution by @jku inside #1943 to use the first bin name as the unique identifier inside DelegatedRole.name when succinct hash delegation is used.

@lukpueh and @jku it's ready for a review.

@MVrachev MVrachev requested review from jku and lukpueh April 20, 2022 16:37
@MVrachev MVrachev marked this pull request as ready for review April 20, 2022 16:37
tuf/api/metadata.py Outdated Show resolved Hide resolved
Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not looked at all of it but the naming scheme seems important enough to raise the issue now (see code comments)

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! I love how all of this can be implemented in the metadata classes and does not need any changes in ngclient.

I mostly reviewed the SuccinctHashDelegations class in this round and left a couple of minor comments, let me know what you think.

Regarding the changes in Delegations and DelegatedRole, I lean towards revisiting the TAP 15 metadata definition. See my suggestion for an alternative format that might make the implementation a little less "hackish" in: #1943 (comment)

I haven't looked closely at the tests yet, but will follow up shortly. Do you plan to add a client test that validates a target delegated to a hash bin?


Args:
hash_prefix_len: Number of bits between 1 and 32 used to separate the
different bins.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a better explanation, either here or in the function description above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better explanation about what is a hash deletion or specifically about succinct hash bin delegation?
Can you give some reference points to what you mean?

if hash_prefix_len <= 0 or hash_prefix_len > 32:
raise ValueError("hash_prefix_len must be between 1 and 32")
if not isinstance(bin_name_prefix, str):
raise ValueError("bin_name_prefix must be a string")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why you don't check if hash_prefix_len is an int, but do check if bin_name_prefix is a string?

Copy link
Collaborator Author

@MVrachev MVrachev Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If hash_prefix_len is not a number a comparison like the one done on line 1283 will throw a TypeError, for example if hash_prefix_len is a string this comparison will throw TypeError: '<' not supported between instances of 'str' and 'int'
I am not convinced there is a point in checking that hash_prefix_len is an int, not a float.

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
@MVrachev
Copy link
Collaborator Author

MVrachev commented May 2, 2022

I have updated my pr and addressed all comments made by @jku and @lukpueh.
I am preparing another commit following @lukpueh suggestions in #1943 (comment) and I am will commit it in another branch.

@MVrachev MVrachev requested review from jku and lukpueh May 2, 2022 07:08
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
MVrachev added 14 commits May 4, 2022 19:18
Start supporting succinct_hash_delegations attribute inside
DelegatedRole. This change is part of TAP 15 implementation.

Unfortunatly, this change is not compatible with all previos python-tuf
versions as the "name" attribute is described as optional in TAP 15,
meaning it has to be replaced to become the fourth argument
instead of being the first when creating a DelegatedRole instance.

Additionally, serialization tests covering the new use cases as TAP 15
states that only one of "paths", "path_hash_prefixes" and
"succinct_hash_delegations" is allowed in a DelegatedRole instance.

Signed-off-by: Martin Vrachev <[email protected]>
Inside test_updater_delegation_graphs.py we define the TestDelegation
class. There the "paths" attribute is defined as a "List[str]" and
it has a default value of a "[*]". This is often used so that all
targets are considered as delegated by the delegated role that will
consequently be created with a paths attribute with a value of "[*]".
Still, there are a couple of test cases where we want to give a "None"
value to the "paths" attribute and mypy complains that we can't assign
a "None" value when the expected type is a list of strings.
That's why we have to make "paths" optional.

Signed-off-by: Martin Vrachev <[email protected]>
As part of the implementation of TAP 15 I had to make sure that ngclient
supports target metadata files using succinct hash delegations.

Some things to keep in mind while reviewing:
- first, it should be noted that when succinct hash info is added inside
a particular delegated role this means that every possible target has
a bin assigned to it
- the "name" attribute for DelegatedRole objects has been changed to
optional inside TAP 15, because when succinct hash delegations is used
"name" becomes redundant
- I have added an additional helper function
`DelegatedRole._calculate_rolename" for easier testing
- I intentially added more tests inside the "get_rolename_dataset"
to showcase how succinct hash delegations is using binary prefix instead
of hash prefix.

Signed-off-by: Martin Vrachev <[email protected]>
Add SuccinctHashDelegations class containing the information from the
succinct_hash_delegation dict described in TAP 15.
This allows for easy mypy checks on the types, easy enforcement on
TAP 15 restrictions (as for example that "delegation_hash_prefix_len"
must be between 1 and 32) and support for unrecognized fields
inside succinct_hash_delegation without much of a hassle.

Signed-off-by: Martin Vrachev <[email protected]>
Add zero padding to bin names inside SuccinctHashDelegations.
Zero padding ensures that the bin names always have the same length.

This characteristic is implied in the example given by TAP 15 where
the third bin is named "alice.hbd-03"
For context read TAP 15:
https://github.com/theupdateframework/taps/blob/master/tap15.md

Signed-off-by: Martin Vrachev <[email protected]>
Change the test_find_delegation_with_succinct_hash_info() test, so it
can test the whole calculation of bin name from a given target path.

Signed-off-by: Martin Vrachev <[email protected]>
Add a SuccinctHashDelegations.get_bin_name() API call which returns the
bin name of a given sequential bin number.
This is useful to easily get the name of the first bin which can be
used as a unique identifier for delegated roles using a succinct hash
delegations.

A unique identifier for such delegated roles with succinct hash
is required by our implementation of Delegations.roles where the keys
are DelegatedRole.name attributes and values are DelegatedRole objects
and the fact that when succinct hash delegations are used
DelegatedRole.name should be populated with "None".

Solves: theupdateframework#1943

Signed-off-by: Martin Vrachev <[email protected]>
Add a public method to calculate and receive the bin name when succinct
hash delegation is used.
It's logical that all of the logic required to calculate the bin name
should be inside the class responsible for that SuccinctHashDelegations.

This will eliminate the need of
SuccinctHashDelegation._find_bin_for_bits() and its test.

Signed-off-by: Martin Vrachev <[email protected]>
Remove the need of calculation of all hash bits and then filtering
those that aren't needed with simple bit shifting.
That way it's clearer what we actually do without using complicated
python tricks.

Signed-off-by: Martin Vrachev <[email protected]>
Deprecate DelegatedRole.is_delegated_path() in favor of
DelegatedRole.find_delegations() which will not only check if the
given targetpath is one of the paths DelegatedRole is trusted to
provide, but it will also return the role name.
This is especially useful when using succinct hash delegations.

Signed-off-by: Martin Vrachev <[email protected]>
Add get_all_bin_names and is_bin in SuccinctHashDelegations.
Those methods proved useful in the testing code, but I believe they have
a potential value for production code as well.

Signed-off-by: Martin Vrachev <[email protected]>
Change Metadata.verify_delegate(), so it can handle delegated roles
where succinct hash delegation is used.
In the case of succinct hash delegation, when the delegator is "targets"
we have to make sure that a given role "r" contains a succinct hash
delegation information and that the "delegated_role" is a role name is a
bin handled by "r".
This complication comes from the idea that we could have two delegated
roles with succinct hash delegation information and there could be
other non-succinct roles as well.

Signed-off-by: Martin Vrachev <[email protected]>
Change RepositorySimulator.add_delegations() to work with delegated
role "role" that uses succinct hash delegations.
This case is special as for that role we are not creating one target,
but we are creating target metadata instances for all bins.

Signed-off-by: Martin Vrachev <[email protected]>
Test traversing the delegation tree when there is a delegated role
using succinct hash delegation.

Signed-off-by: Martin Vrachev <[email protected]>
@MVrachev
Copy link
Collaborator Author

MVrachev commented May 4, 2022

I think I have applied all suggestions by @jku and @lukpueh.
Most notably I changed:

  • the bin names suffix is now hex
  • calculating the bin names is done by bit shifting
  • test_find_delegation_with_succinct_hash_info is changed, so it's testing the whole process of calling find_delegations to reciving the actual bin name
  • added multiple bin names case inside test_succinct_hash_bins_graph_traversal to make sure all bin files are existing in cases where zero padding has to be used (the bins are more than 10).

@MVrachev MVrachev requested review from jku and lukpueh May 4, 2022 16:36
@MVrachev
Copy link
Collaborator Author

MVrachev commented May 9, 2022

I have implemented @lukpueh suggestion from the comment here https://github.com/MVrachev/tuf/tree/new-suggestion.
For those interested, please comment on issue #1943.

@MVrachev
Copy link
Collaborator Author

MVrachev commented May 20, 2022

Together with @jku, @lukpueh, and the TAP author - @mnm678 we met to discuss if we should change the design on TAP 15 based on multiple suggestions.
We end up deciding that there are multiple changes that we want to make which are all listed in my comment here.
Marina updated the TAP 15 content in her pr theupdateframework/taps#154 and there were multiple changes that I had to do in this pr to be up to date.
That's why I decided it's probably best if I close this pr, make the changes from scratch (reusing big chunks of this code of course) and present a cleaner history in a new pr.
This pr is superseded by #2010

@MVrachev MVrachev closed this May 20, 2022
@MVrachev MVrachev deleted the tap15 branch June 17, 2022 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants