Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Consolidation Feature with Pattern Replacement #1020

Open
scottkleinman opened this issue Jun 8, 2020 · 3 comments
Open

Replace Consolidation Feature with Pattern Replacement #1020

scottkleinman opened this issue Jun 8, 2020 · 3 comments

Comments

@scottkleinman
Copy link
Contributor

scottkleinman commented Jun 8, 2020

This proposal would be to replace the Consolidation feature with a more general Pattern Replacement Feature. The only real difference is that, instead of entering a, b: c (replace all occurrences of a and b with c), the user will enter a, b (replace all occurrences of a with b). That should be very easy to fix on the back end. We could then add a regex option so that the user could add a regex pattern 'a|b', 'c' (replace all occurrences of a OR b with c.

The addition of regex pattern matching will allow greater flexibility and solve problems like issue #956. And it should only require a few more lines of code.

@scottkleinman
Copy link
Contributor Author

scottkleinman commented Jun 9, 2020

Update: The regex_replace branch implements a basic proof-of-concept regex pattern replacement. I chose to use the syntax a > b (for a non-regex replacement of a to b) and REGEX:^a > b for a regex replacement of a at the start of a string to b). > inside the string must be escaped with a backslash. It doesn't yet handle capture groups, and it might be possible to make it faster, but it works.

Oops! Actually, capture groups do work. They can be referenced as \1, \2, etc.

@mleblanc321
Copy link
Contributor

mleblanc321 commented Jul 1, 2020

i have refactored the pattern replacement method scrubber.py::pattern_replacement_handler() (and added one method) ... but i'm pleading for help on how best to push these to the pattern_replacement branch that scott made :( (sorry, i'm using vscode, i'm on scott's branch, i've committed and pulled, but just not sure how to push back to pattern_replacement)

  • the code now handles multiple replacement changes (one per line)
  • allows for (once UI provides checkbox) "apply regex to each token" (will only work for languages where tokens are separated by whitespace); note: code uses a list comprehension, but only one, each of the individual (regex-)pattern replacements applied to each token, in order
  • timing shows two edits working on mobyDict.txt in 1.5sec, so appears to be working relatively fast

@mleblanc321
Copy link
Contributor

Note: the Upload File button is not working in this branch, e.g., if someone wanted to upload their (one per line) list of (regex) patterns to apply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants