Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] URL variables are not properly escaped #827

Closed
ssm951 opened this issue Feb 3, 2024 · 4 comments
Closed

[BUG] URL variables are not properly escaped #827

ssm951 opened this issue Feb 3, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@ssm951
Copy link

ssm951 commented Feb 3, 2024

What is the bug?

When a GetRequest is referenced in a GET <index>/_doc/<id> call, the ID variable gets URL encoded for all special characters other than punctuation. This is causing issue with IDs that are a JSON formatted string. The braces character and quotes character gets encoded, but not the , or the :. Here is a sample GET request logged (scrubbed):

Executing request GET /<index>/_doc/%7B%22id%22:%22test-id%22,%22version%22:0%7D HTTP/1.1

This would result in a 404 returned by the Java client. When I entered the same request in the OpenSearch dashboard dev console, the same result occurs. However, when I encode the , and : properly in the console request, the document I wanted gets returned.

How can one reproduce the bug?

Write a document into the OpenSearch collection with a JSON string as the ID.

Pass in that JSON string in a OpenSearchClient.get() request as the ID.

What is the expected behavior?

The ID gets encoded properly into this request:

GET /<index>/_doc/%7B%22id%22%3A%22test-id%22%2C%22version%22%3A0%7D

What is your host/environment?

AWS Lambda Java 17

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

When digging into the codebase, I found that the encoding is done by org.apache.http.client.utils.URLEncodedUtils.formatSegment, which encodes only escaping characters that would affect the path. I used java.net.URLEncoder.encode to get the expected behavior, but I'm not sure how it affects other use cases (such as spaces).

@ssm951 ssm951 added bug Something isn't working untriaged labels Feb 3, 2024
@dblock dblock removed the untriaged label Feb 5, 2024
@dblock
Copy link
Member

dblock commented Feb 5, 2024

Is this encoding happening incorrectly in the client? Can you encode before calling a client?

Want to try to add a test that reproduces this, and maybe a fix?

@ssm951
Copy link
Author

ssm951 commented Feb 5, 2024

When encoding before calling the OpenSearchClient.get(), the encoded % characters would then get decoded as well.

The problem is this line of code:
https://github.com/opensearch-project/opensearch-java/blob/main/java-client/src/main/java/org/opensearch/client/opensearch/core/GetRequest.java#L507

Which calls this method:

public static void pathEncode(String src, StringBuilder dest) {

Unfortunately, this encoding isn't configurable by users of OpenSearch client, so I don't have a way to override this behavior.

@dblock
Copy link
Member

dblock commented Feb 6, 2024

Thanks, so it's just a bug where those arguments should not be encoded at all? I would write some tests of the expected behavior and fix it.

@Xtansia
Copy link
Collaborator

Xtansia commented Oct 24, 2024

This was the same root cause as #1068, A fix/feature to support this was released as part of v2.13.0.
For safety/backwards-compatibility reasons we couldn't change the default behavior in 2.x, however you should be able to opt into "safer" encoding behavior which will correctly encode , and : by setting the following system property: org.opensearch.path.encoding=HTTP_CLIENT_V5_EQUIV

@Xtansia Xtansia closed this as completed Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants