You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We recently had a customer that is trying to use } and }} for splitting very long strings. (See rapidsai/cudf#14087) Our transpiler didn't catch this as being the same as a static string because they are not escaped. But because they do not have a matching { character, they end up being treated as if they were escaped when processing the regular expression. We should update this if possible. Might be good to look at other special characters that need to match like ] and ).
The text was updated successfully, but these errors were encountered:
Should mention that the related customer issue manifests in a hang, so this is a little more complicated than what might be described.
We should certainly update the simple string transpilation optimization that avoids regex to handle cases like these. However, for hanging )]} in valid regexes, these might need to be escaped in the transpiler if we detect a non-matching pair, which will require detecting matching pairs in the transpiler for valid regexes.
A couple of interesting notes. On the CPU, Spark uses java.util.regex.Pattern to perform regular expression operations, so a these cases throw exceptions on the CPU:
( - "Unclosed group"
) - "Unmatched closing ')'"
{ - "Illegal repetition"
[ - "Unclosed character class"
So really need to handle ] and ]] and } and }} mostly
Is your feature request related to a problem? Please describe.
We recently had a customer that is trying to use
}
and}}
for splitting very long strings. (See rapidsai/cudf#14087) Our transpiler didn't catch this as being the same as a static string because they are not escaped. But because they do not have a matching{
character, they end up being treated as if they were escaped when processing the regular expression. We should update this if possible. Might be good to look at other special characters that need to match like]
and)
.The text was updated successfully, but these errors were encountered: