A simple shell script for ordinary humans who want to write complex, perl-compatible regular expressions (PCREs).
A "regular expression" is a set of characters (a, b, c, *, +, etc.) that means something coherent according to the rules of a particular programming language. Most of the time, programmers use the phrase "regular expression" to refer to a set of computer characters used as a "match string" for finding instances of a string of text in a given text file. In a standard search engine (e.g., Google), a "match string" is the string of characters you type into the search bar, where each character you type corresponds with a single letter, number, symbol, etc. A regular expression is more flexible and precise than that. For example, using a regular expression, you can tell your computer to search for every whole word that begins with the letters "lov" followed by at least one more letter, which would match "love," "loves," "loving," "lover," etc. That's the power of regular expressions.
"Perl" is a computer programming language. A perl-compatible regular expression (PCRE), then, is a regular expression that programs written in perl can interpret and use. Sometimes, PCREs can be difficult for ordinary humans to read and write. So I wrote a simplified version of the syntax for PCREs, one that I found easier to read and write, and I called it "Perlish." And I wrote a shell script to convert Perlish expressions into PCREs so you can write the Perlish expression and then copy and paste the PCRE wherever you want. I use PCREs when I run complex searches in texts, where I want to set precise conditions for a match so that I can find what I want to find more quickly and more reliably (e.g., finding topics of interest in a text, finding key passages, etc.).
Here's how to write Perlish expressions.
Expression | Interpretation |
---|---|
words | "words" must appear on the page |
[words] x | "words" must appear on the page, before x |
x [words] | "words" must appear on the page, after x |
~[words] x | "words" must not appear on the page, before x |
x ~[words] | "words" must not appear on the page, after x |
[words]{n} x | with n words between "words" and x {n1-nz} n1, n2, n3 . . . or nz words {n+/n-} n or more/less words {>n/<n} greater/less than n words |
(characters) | "characters" may appear, too |
* | one or more letters and/or digits |
a/b | a or b |
/string/ | "string" is a perl-compatible regular expression |
\ | interpret the next character literally \[ \] \( \) \* \/ \" |
1 | 2 | 3 |
---|---|---|
[words] | words | [words] |
context before | core | context after |
1a | 1b | 2 | 3 |
---|---|---|---|
[words] | [words] | words | [words] |
context before1 | context before2 | core | context after |
1 | 2 | - |
---|---|---|
[ [ words ] words [ words ] ] | words | - |
/------------context before------------/ | core | - |
/---context---/---core---/--context---/ | - | - |
Observe the matches generated by each query (if any) in bold.
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide
Observe the matches generated by each query (if any) in bold.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.
You your best thing, Sethe. You are.