Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace RequestsFetcher for Urllib3Fetcher #2762

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

NicholasTanz
Copy link
Contributor

@NicholasTanz NicholasTanz commented Jan 6, 2025

Description of the changes being introduced by the pull request:

Fixes #2751

… ngclient, replace requestsFecther with urllibFetcher in requestsFetcher unit tests.

Signed-off-by: NicholasTanz <[email protected]>
@NicholasTanz NicholasTanz requested a review from a team as a code owner January 6, 2025 07:19
@NicholasTanz NicholasTanz marked this pull request as draft January 6, 2025 07:20
@NicholasTanz
Copy link
Contributor Author

NicholasTanz commented Jan 6, 2025

Hi @jku, I implemented a FetcherInterface that utilizes urllib3, most of the implementation is quite similar to RequestsFetcher. For the tests, I swapped out the RequestsFetcher for the Urllib3Fetcher in test_fetcher_ng.py. (I haven't removed the requests_fetcher.py yet)

@NicholasTanz NicholasTanz marked this pull request as ready for review January 6, 2025 08:13
@jku
Copy link
Member

jku commented Jan 7, 2025

On first pass this looks really good, thanks -- I was worried there might be some differences that makes this complicated but your work seems very promising. Still, we'll have to be careful not to change any details we don't intend to as this is very core client code that easily affects applications (I mention this in case the review takes a while: it's only because I know this is a potentially tricky change).

I think we can probably even drop the multiple poolmanagers: this manual connection pool management was added at a time when this wasn't part of the http library but now I believe the poolmanager handles exactly the issues the _get_poolmanager() code is working around. I'm going to review this better in the next days so I'll look into this.

@@ -102,7 +102,7 @@ def __init__(
if fetcher is not None:
self._fetcher = fetcher
else:
self._fetcher = requests_fetcher.RequestsFetcher(
self._fetcher = urllib3_fetcher.Urllib3Fetcher(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we change the default fetcher already in this PR then we'd want to change the requirements as well (pyprojects.toml, requirements/main.txt and regenerate requirements/pinned.txt) but we can still discuss if there's benefit to shipping this as alternative (non-default) fetcher first. Maybe there isn't -- if we don't think anyone is going to use and test it as an alternative then our only real choice is to swap the defaults

However, I think this change in dependencies would remove certifi from the indirect dependencies -- we'll have to take look at that and decide what is correct for python-tuf:

  • what should the default certificate source be (certifi is the status quo but we may need a few lines of code to get the connectionpool to use it)
  • do we allow users to change this default? how?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may need a few lines of code to get the connectionpool to use it

this is correct: by default urllib3 uses the OS certificate store

Copy link
Contributor Author

@NicholasTanz NicholasTanz Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to swap out the defaults in this PR, if it isn't too much overhead for changing the requirements. For the certificate source, I think it makes sense to utilize certifi instead of the OS certificate source, based on this page it seems like a quick swap.

@NicholasTanz
Copy link
Contributor Author

NicholasTanz commented Jan 7, 2025

I think we can probably even drop the multiple poolmanagers: this manual connection pool management was added at a time when this wasn't part of the http library but now I believe the poolmanager handles exactly the issues the _get_poolmanager() code is working around. I'm going to review this better in the next days so I'll look into this.

sounds good, I was looking at this page and I think it is supported, so I can remove _get_poolmanager() / _poolManagers and just use one of them. I was wondering if I should pass in a different value for the num_pools parameter, or if the default amount of 10 works.

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some feedback now:

  • let's leave the certificate store decision open for now: William has a good point that certifi is problematic in the linked sigstore-python issue...
  • I believe we can drop the manual session management and just use a single PoolManager
  • The timeouts need a bit of work: we are supposed to handle timeouts specifically so we can raise SlowRetrievalError... but it looks to me like that will actually mean little more work with urllib3 (since by default we get automatic retries which I believe means any timeouts are actually wapped in MaxRetryError)
    • I think we could add some error parsing in _fetch and _chunks to only raise SlowRetrievalError on these cases
    • but on the other hand I don't want complex code for a feature I don't really think is useful (defence against slow retrieval attacks is not really possible at library level IMO) -- so let's not overdo this

tuf/ngclient/_internal/urllib3_fetcher.py Outdated Show resolved Hide resolved
tuf/ngclient/_internal/urllib3_fetcher.py Outdated Show resolved Hide resolved
tuf/ngclient/_internal/urllib3_fetcher.py Outdated Show resolved Hide resolved
tuf/ngclient/_internal/urllib3_fetcher.py Outdated Show resolved Hide resolved
tests/test_fetcher_ng.py Outdated Show resolved Hide resolved
tests/test_fetcher_ng.py Outdated Show resolved Hide resolved
@jku
Copy link
Member

jku commented Jan 9, 2025

For the record I was looking at the timeout results with these simple tests:

import urllib3
# Test connection timeout: MaxRetryError caused by NewConnectionError
urllib3.request("GET", "https://github.com", timeout=urllib3.Timeout(connect=0.0001))
# Test read timeout: MaxRetryError caused by ReadTimeoutError
urllib3.request("GET", "https://github.com", timeout=urllib3.Timeout(read=0.0001))

(note that this simple test reuses the default connection pool so a successful connection will prevent retesting the connection timeout without reloading the module)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate switching to urllib3
2 participants