Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pattern replace by making flag optional as on api #895

Merged

Conversation

grouh
Copy link
Contributor

@grouh grouh commented Mar 14, 2024

Description

This PR makes the flag optional on the pattern replace char filter, as on OS API

Issues Resolved

Fixes #886

@reta
Copy link
Collaborator

reta commented Mar 14, 2024

Thank you, @grouh , please add CHANGELOG entry and at least one test for the change.

@grouh grouh force-pushed the fix/pattern-replace-optional-flag branch from f894817 to efe0d51 Compare March 14, 2024 15:32
reta
reta previously approved these changes Mar 14, 2024
VachaShah
VachaShah previously approved these changes Mar 14, 2024
CHANGELOG.md Outdated Show resolved Hide resolved
@harawata
Copy link

I could be wrong, but replacement also is optional, isn't it?

@reta
Copy link
Collaborator

reta commented Mar 14, 2024

I could be wrong, but replacement also is optional, isn't it?

Hm ... I don't think so, the filter is not very useful if it replaces nothing (by and large, this is replace charfilter)

@harawata
Copy link

@reta ,

Thank you for the reply!

I don't disagree about the usefulness and replacement being required does not affect me, so please carry on. :)

I commented only because 1) the server accepts pattern_replace without replacement and 2) the replacement is optional in the Elasticsearch doc (I couldn't find OpenSeach doc about the filter).
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html

PUT my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "lowercase"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(?<=\\p{Lower})(?=\\p{Upper})"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

@reta
Copy link
Collaborator

reta commented Mar 14, 2024

I don't disagree about the usefulness and replacement being required does not affect me, so please carry on. :)

Thank you @harawata , what would be the function of the char filter when pattern_replace is provided without replacement? The fact server accepts this combination points out there could be a problem there (and yes, we do have large gaps in documentation sadly).

@harawata
Copy link

harawata commented Mar 14, 2024

@reta ,

I just did a quick test and it seems that when replacement is omitted, it simply removes the matched string (so, same behavior as replacement: "", probably?).

For example, with this index...

PUT my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "lowercase"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "Ba."
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

... the following returns ["the", "foo", "method"].

POST my-index-000001/_analyze
{
  "analyzer": "my_analyzer",
  "text": "The fooBarBaz method"
}

I'm still new to Elasticsearch/OpenSearch, so please forgive me if I'm missing something basic.

@reta
Copy link
Collaborator

reta commented Mar 14, 2024

I just did a quick test and it seems that when replacement is omitted, it simply removes the matched string (so, same behavior as replacement: "", probably?).

Thanks @harawata , you are totally and absolutely correct in your observations (pasting below the server side code that uses empty string if value is not provided):

replacement = settings.get("replacement", ""); // when not set or set to "", use "".

With that being said, I would prefer the client to require replacement, as an explicit intent what the replacement value should be, thank you.

@harawata
Copy link

harawata commented Mar 15, 2024

Personally, I am OK with that.

I should, however, mention that the mismatch between server and client could cause an issue when retrieving the index.
Assuming that the index in my previous comment exists on the server, the following code throws the MissingRequiredPropertyException when the client internally deserializes the server response.

osClient.indices().get(x -> x.index("my-index-000001"));

This was actually how I encountered #886 at first.

Signed-off-by: Grouh <[email protected]>
@grouh grouh dismissed stale reviews from VachaShah and reta via 0742ae2 March 15, 2024 10:58
@grouh
Copy link
Contributor Author

grouh commented Mar 15, 2024

If the consensus is to let replacement as mandatory for explicit use of the replace value, can this PR considered for merging ? or is there any other blocking point ?

@reta
Copy link
Collaborator

reta commented Mar 15, 2024

If the consensus si to let replacement as mandatory for explicit use of the replace value, can this PR considered for merging ? or is there any other blocking point ?

I think the deserialization failure mentioned by @harawata (mismatch between server and client could cause an issue when retrieving the index) leave us no choice - we have to make replacement optional. Could you please address that and also have a test for it? Thank you and thanks to @harawata to digging it through.

@grouh grouh force-pushed the fix/pattern-replace-optional-flag branch from 6e3144e to 5692fed Compare March 15, 2024 13:43
@dblock dblock merged commit e240b27 into opensearch-project:main Mar 15, 2024
49 checks passed
@grouh
Copy link
Contributor Author

grouh commented Mar 15, 2024

Thanks @reta @dblock @harawata @VachaShah

@reta reta added the backport 2.x Backport to 2.x branch label Mar 15, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 15, 2024
* Fix pattern replace by making flag optional as on api

Signed-off-by: Grouh <[email protected]>

* Update changelog

Signed-off-by: Grouh <[email protected]>

* Add unit test for pattern replace char filter analyzer

Signed-off-by: Grouh <[email protected]>

* fix changelog

Signed-off-by: Grouh <[email protected]>

* fix char pattern replace by making replcement optional

Signed-off-by: Grouh <[email protected]>

* Add deserialze test with only pattern set

Signed-off-by: Grouh <[email protected]>

---------

Signed-off-by: Grouh <[email protected]>
(cherry picked from commit e240b27)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
reta pushed a commit that referenced this pull request Mar 15, 2024
* Fix pattern replace by making flag optional as on api



* Update changelog



* Add unit test for pattern replace char filter analyzer



* fix changelog



* fix char pattern replace by making replcement optional



* Add deserialze test with only pattern set



---------


(cherry picked from commit e240b27)

Signed-off-by: Grouh <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@BrendonFaleiro BrendonFaleiro mentioned this pull request Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Unable to create a pattern replace character filter without flags
5 participants