Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect NO_PROXY environment variable. #668

Open
zicklag opened this issue Oct 4, 2023 · 12 comments
Open

Respect NO_PROXY environment variable. #668

zicklag opened this issue Oct 4, 2023 · 12 comments
Labels
help wanted Extra attention is needed

Comments

@zicklag
Copy link

zicklag commented Oct 4, 2023

With the try_proxy_from_env(true) option, ureq correctly reads the HTTP[S]_PROXY environment variables, but it doesn't honor the NO_PROXY environment variables. Also, it's not possible to efficiently implement from outside of ureq, because you would have to create a new Agent for each request to a different domain, instead of being able to share the same agent state among all domains.

@mcr
Copy link
Contributor

mcr commented Oct 4, 2023 via email

@algesten
Copy link
Owner

algesten commented Oct 5, 2023

@zicklag

but it doesn't honor the NO_PROXY environment variables.

I know very little about proxies. PR welcome.

Also, it's not possible to efficiently implement from outside of ureq, because you would have to create a new Agent for each request to a different domain, instead of being able to share the same agent state among all domains.

Agent is reusable for multiple requests.

@mcr

Also redirects that had different proxy needs wouldn't work.

Not sure I follow. Do you mean there should be different proxy settings per request host?

@mcr
Copy link
Contributor

mcr commented Oct 5, 2023 via email

@algesten
Copy link
Owner

ureq 3.x needs support for this. We should do all the options curl does:

https://github.com/curl/curl/blob/master/docs/libcurl/opts/CURLOPT_NOPROXY.md

@swsnr
Copy link

swsnr commented Nov 12, 2024

In general, proxy lookup should use the interface lookup(url: Url) -> Vec<Url> provided by the user of the API. ureq would call this function for every request, pass the target, and get a list of proxy URLs in return, in order of priority.

If the list is empty or contains direct:// then a direct connection can be used, otherwise ureq would use the first proxy in the list whose URL has a supported protocol. That's the interface used by Gio (see gio_proxy_resolver_lookup, easily the best proxy implementation on Linux) and WinHTTP (see WinHttpGetProxyForUrl). I don't know much about macOS, but as far as I remember it also supports PAC URLs, so it'd necessarily have a similar interface.

This allows to implement generic per-URL proxy lookup with complex exclusion rules, PAC URL lookup (see seanmonstar/reqwest#1764 for some diagrams showing real-world proxy lookup with PAC URLs), multiple proxy types (e.g. return both an HTTP and a SOCKS proxy URL, and ureq automatically picks the one it suppots), dynamic per-network proxy lookup (e.g. use no proxy in the public WiFi connection on the train to work, and then automatically use the corporate proxy once at office, without restarting the running application) etc.

On top of this interface it's trivial to implement NO_PROXY, and this interface would be mandatory to implement automatic support for the configured system proxy (e.g. automatically usign the proxy from Windows settings, or from GNOME settings, which is somewhat expected in a desktop application).

@algesten
Copy link
Owner

Thanks! Very useful info!

Is there a crate that abstracts parts of this? I don't want ureq to directly call glibc or winapi.

@swsnr
Copy link

swsnr commented Nov 12, 2024

Oh I'm sorry, that's not what I meant. I don't think ureq should call system-specific APIs to call proxy lookups, nor link to a crate which does that.

That's a lot of work (e.g. the Windows API around this is exceptionally ugly), and very brittle, as some of these APIs are not always available (e.g. the GIO API is usually only available in a desktop environment, but not e.g. in a container or a headless server).

ureq should just provide the appropriate interface to hook into, i.e. the function outlined in my previous comment. Then users of ureq can implement whatever proxy support they require: In a server application I'd perhaps just not use proxies at all, in a command line tool I could use env-proxy for curl-style environment variables, in e.g. a Linux desktop application I could use the GIO proxy resolver, or the equivalent API in Qt, or directly talk to the corresponding portal API, and in Windows I could get my hands dirty with the windows API.

But I think ureq should leave that users, and not implement any default.

With the above API you wouldn't even have to implement environment variables in ureq, and instead just refer to the env-proxy crate for $HTTPS_PROXY and $NO_PROXY support.

TL;DR: ureq should just have an interface for a custom lookup(url: Url) -> Vec<Url> implementation.

@algesten
Copy link
Owner

Ah. Got ya. Should be fairly
trivial.

In ureq 3x we might not need to do anything. The Transport abstraction means you can open a connection any way you want.

@swsnr
Copy link

swsnr commented Nov 12, 2024

I think I'd appreciate a convenience API on the ConfigBuilder. Proxy support is fairly common, especially in corporate environments, whereas passing a an actual custom transport seems to be a more exotic use case. As such, it's probably not immediately obvious that creating a transport is the way to go for generic proxy support, less so since the current docs for rc2 even say that with_parts is "low level API that isn’t for regular use of ureq."

@algesten
Copy link
Owner

As such, it's probably not immediately obvious that creating a transport is the way to go for generic proxy support, less so since the current docs for rc2 even say that with_parts is "low level API that isn’t for regular use of ureq."

I guess I'm a bit confused on how common these proxy configurations are. On the one hand you say we don't want to build in support for calling these brittle syscalls (for which I'm grateful), on the other we want something simpler than building our own Transport (which isn't very hard tbh).

One intention with making the Transport pluggable is that ureq itself won't need to support every possible case out there (supporting various TLS libraries is another area). A transport targeting, say the windows version of these proxy server strategies, could potentially be maintained in a generic form outside of ureq.

@swsnr
Copy link

swsnr commented Nov 12, 2024

I can't comment on transports, as I haven't tried building my own transport yet.

I just noticed that the documentation doesn't mention transports prominently, and as said even states that they aren't for regular use, so I didn't associate them with proxy support which is pretty regular use if an HTTP library I'd say.

So perhaps it's just a matter of documenting transports more prominently?

@algesten algesten added the help wanted Extra attention is needed label Nov 26, 2024
@algesten
Copy link
Owner

This is a fairly trivial feature to implement in ureq 3.x. I mark it as help wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants