-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modification to syntax.js parser RegEx to provide better Unicode support #4390
base: main
Are you sure you want to change the base?
Conversation
This regex is pure evil: quotes disappear, chips to arbitrary metadata are broken, ligatures… But thank you, it fixes diacritics at least. So (almost) fixes #1900. |
I've had a good rack of my brains at what the problem we encountered was with this when we attempted previously, my best guess is that we also tried to fix the problem of searching with a chip with a quoted field name (@paperboyo 's example for this was This inspired me to have another go again at getting the regex to play nicely with quoted field names, and building on top of your changes, I got to 964a6f0 which currently seems to cover all the cases I can think of. @paperboyo is back on Monday, and I'll ask him to give it all one more look, but I can't see any reason to object to this. That said, it looks like a couple of unrelated changes to the start scrips have crept into this branch, can we drop those commits? Then we should be able to get this merged |
Hi @andrew-nowak, thanks for taking a look at this - I couldn't find and example where the proposed changes failed either - I'll take a look at your new proposal and deploy it on INT for product to test at this end. Thanks |
What does this change?
Currently if a user inputs unicode characters into the general search string (e.g Möbius) then after the user has viewed a matching image and returned to the grid view via 'Back to Search' (or performed a page refresh) the unicode characters are lost from the search bar content...
This PR corrects this problem by modifying the parser RegEx that is used to analyse the query content to create the structured query including chips. After implementation the resulting search bar content retains the unicode characters...
The characters are also retained within chips within the search bar.
How should a reviewer test this change?
Ensure that any unicode characters entered into the search bar are retained on page refresh.
Who should look at this?
Tested? Documented?