Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17221: Http2SolrClient merges case sensitive solr params #3028

Merged
merged 4 commits into from
Jan 16, 2025

Conversation

831973741yy
Copy link
Contributor

@831973741yy 831973741yy commented Jan 12, 2025

https://issues.apache.org/jira/browse/SOLR-17221

Description

The Request object composed by Http2SolrClient via POST/PUT method merges case sensitive solr params. This is because it uses the default constructor of the Fields class (org/eclipse/jetty/util/Fields.java) which sets caseSensitive=false

Solution

Set caseSensitive=true when instantiate Fields object

Tests

Added unit tests to cover case sensitive solr param cases, even for other methods such as GET, etc

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you propose a short description that shall go in CHANGES.txt ? Basically, how might a user experience this bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the only test changes needed are the ones you did here in this file. The other two source files extend this one and thus will run the same test code (right?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. The other two prepare the requests that go through the flow. The case sensitive params are added there

Comment on lines 786 to 789
// application/x-www-form-urlencoded
Fields fields = new Fields();
Fields fields = new Fields(true);
Iterator<String> iter = wparams.getParameterNamesIterator();
while (iter.hasNext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole else block could be replaced with something very close to this, notwisthanding the annoying leading question mark to chop off of toQueryString:

req.body(new StringRequestContent("application/x-www-form-urlencoded", wparams.toQueryString(), FALLBACK_CHARSET));

I suppose if there are no params then we don't even need to send a body.

WDYT @jdyer1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, the logic in this else block + convert function in FormRequestContent is largely the same as wparams.toQueryString() except:

  • leading question mark from wparams.toQueryString()
  • wparams.toQueryString() uses hard coded utf-8 charset as oppose to the FALLBACK_CHARSET passed to FormRequestContent

I guess we can replace it (I tried it, and all unit tests passed), but we need to

  1. put in some logic (if wparams.toQueryString().startwith("?") then wparams.toQueryString().substring(1)) to handle the leading question mark from wparams.toQueryString(). Although not likely, if the toQueryString function changes the question mark logic in the future, we might have problem here.
  2. assume FALLBACK_CHARSET remains utf-8 or pass a charset to SolrParams.toQueryString function (not sure if adding a new toQueryString function in SolrParams is justified for this special use case)

The current implementation with Fields has less dependencies stated above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code here is longer and goes through a Fields intermediary mapping. This is why I find SolrParams existing method toQueryString a tempting substitute. I'm doubtful we'll change toQueryString's leading question mark because it's not worth the disturbance on users for a triviality.

I think FALLBACK_CHARSET must be UTF8 always be design. It's not clear what purpose this constant serves vs referencing UTF8 directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. updated the PR. I think it's still safer to check if wparams.toQueryString() has leading "?" before chop it off

@831973741yy
Copy link
Contributor Author

831973741yy commented Jan 13, 2025

@dsmiley

Can you propose a short description that shall go in CHANGES.txt ? Basically, how might a user experience this bug?

Case sensitive solr params does not work reliably in multi-shard setting. For example, faceting per field params such as f.CASE_SENSITIVE_FIELD.facet.limit=50.

…ueryString(). This will avoid to use Fields as an intermediary mapping
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM thanks! I'll add a CHANGES.txt entry

SOLR-17221: If multiple query param keys are sent that only vary by case, they were wrongly merged when doing distributed search (sharded collections). This could likely occur for fielded parameters such as f.CASE_SENSITIVE_FIELD.facet.limit=50.
(Yue Yu, David Smiley)

@dsmiley dsmiley merged commit 82083ea into apache:main Jan 16, 2025
3 checks passed
dsmiley pushed a commit that referenced this pull request Jan 16, 2025
…3028)

If multiple query param keys are sent that only vary by case, they were wrongly merged when doing distributed search (sharded collections). This could likely occur for fielded parameters such as f.CASE_SENSITIVE_FIELD.facet.limit=50

Also: compose the request content directly using SolrParams.toQueryString(). This will avoid to use Jetty Fields as an intermediary mapping

---------

Co-authored-by: David Smiley <[email protected]>
(cherry picked from commit 82083ea)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants