Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand normalize to handle special delimiters in path segment and (possibly) handle normalization of dot segments #24

Open
FiV0 opened this issue Jan 4, 2022 · 3 comments
Labels

Comments

@FiV0
Copy link
Contributor

FiV0 commented Jan 4, 2022

From the naming I gathered that the intended use case for normalize was that uris that are semantically equivalent should get a canonical form. Some examples where this is currently not the case:

(require '[lambdaisland.uri.normalize :as norm]
         '[lambdaisland.uri :as uri])

Path segments:

(norm/normalize (uri/uri "https://foobar.org/foo/../bar"))
(norm/normalize (uri/uri "https://foobar.org/foo/./bar"))

Scheme based normalization (the following are urls are equivalent)

      http://example.com
      http://example.com/
      http://example.com:/
      http://example.com:80/

See here.

Is this sort of out of scope for this library? Should this be added?

Also, from my understanding of rfc3986, but this might be wrong, if I want to use a reserved character as part of the data I need to percent-encode it. For example, let's say I want to use / in my data, the request for an endpoint of /api/{data} could look for example something like "http://foobar.com/api/some%2Fdata". normalize would then confound the following two uris.

(norm/normalize (uri/uri "http://foobar.com/api/some%2Fdata"))
(norm/normalize (uri/uri "http://foobar.com/api/some/data"))

So my question is, shouldn't only unreserved characters be decoded as part of the normalization?

@plexus
Copy link
Member

plexus commented Jan 4, 2022

Normalize does not currently handle dot segments or default ports, it only deals with percent encoding at the moment. We do have the algorithm for dot segment resolution implemented as part or uri/join, I think it could make sense to add that to normalize.

(norm/normalize (uri/uri "http://foobar.com/api/some%2Fdata"))
(norm/normalize (uri/uri "http://foobar.com/api/some/data"))

This is indeed a bug. We handle special delimteres in the query segment, but not in the path segment. A PR for that is welcome as well.

@FiV0
Copy link
Contributor Author

FiV0 commented Jan 5, 2022

I will try to have a look when I find the time.

@alysbrooks alysbrooks added the bug label Jul 17, 2023
@alysbrooks alysbrooks changed the title Intended use case for normalize Expand normalize to handle special delimiters in path segment and (possibly) handle normalization of dot segments Jul 17, 2023
@alysbrooks
Copy link
Member

This issue includes a bug so it makes sense to keep open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants