-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip all tags from fields with mixed content #507
Comments
Perhaps something like:
|
I wonder if we can just get away with the regex find/replace? The string that would be |
Good point, though it may be more common than you would expect. One of our primary cases would be where a Wrapping the This makes me wonder, though, how ASpace deals with namespaces in user-entered content. Can an archivist really enter HTML or EAD tags interchangeably in ASpace and expect correct function? |
I think the short answer to your question is no. From a display perspective entering Part of what we need to remember is that we're transforming the JSON response. In the use case that you provided earlier, what comes back in the JSON is
Which is what we want. HOWEVER, the regex approach produces the same result, so I'm not seeing the benefit of introducing additional complexity with HTML parsing, unless I'm missing something?
|
No objection to just using a regexp; I only proposed the xml library as forward looking to #508 and because someone else has done a lot more work in parsing XML than the proposed coverage of a trivial regexp. I'm sure the regexp has unconsidered false positives and missed edge cases. |
Is your feature request related to a problem? Please describe.
Text with XML or HTML tags is rendered as a string rather than a tag.
Describe the solution you'd like
Strip all HTML and XML tags from text before indexing.
Describe alternatives you've considered
This is a short-term solution. A more permanent solution will be articulated in another issue.
Additional context
Tags are most likely to be encountered in note content but may also be present in other fields, for example titles.
We will need to accommodate the presence of angle brackets which are not tags, for example mathematical content such as:
The text was updated successfully, but these errors were encountered: