Skip to content

A simple shell script for ordinary humans who want to write complex, perl-compatible regular expressions (PCREs).

License

Notifications You must be signed in to change notification settings

bthomaslee/perlish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Perlish

A simple shell script for ordinary humans who want to write complex, perl-compatible regular expressions (PCREs).

A "regular expression" is a set of characters (a, b, c, *, +, etc.) that means something coherent according to the rules of a particular programming language. Most of the time, programmers use the phrase "regular expression" to refer to a set of computer characters used as a "match string" for finding instances of a string of text in a given text file. In a standard search engine (e.g., Google), a "match string" is the string of characters you type into the search bar, where each character you type corresponds with a single letter, number, symbol, etc. A regular expression is more flexible and precise than that. For example, using a regular expression, you can tell your computer to search for every whole word that begins with the letters "lov" followed by at least one more letter, which would match "love," "loves," "loving," "lover," etc. That's the power of regular expressions.

"Perl" is a computer programming language. A perl-compatible regular expression (PCRE), then, is a regular expression that programs written in perl can interpret and use. Sometimes, PCREs can be difficult for ordinary humans to read and write. So I wrote a simplified version of the syntax for PCREs, one that I found easier to read and write, and I called it "Perlish." And I wrote a shell script to convert Perlish expressions into PCREs so you can write the Perlish expression and then copy and paste the PCRE wherever you want. I use PCREs when I run complex searches in texts, where I want to set precise conditions for a match so that I can find what I want to find more quickly and more reliably (e.g., finding topics of interest in a text, finding key passages, etc.).

Here's how to write Perlish expressions.

Building Blocks

Expression Interpretation
words "words" must appear on the page
[words] x "words" must appear on the page, before x
x [words] "words" must appear on the page, after x
~[words] x "words" must not appear on the page, before x
x ~[words] "words" must not appear on the page, after x
[words]{n} x with n words between "words" and x

       {n1-nz}        n1, n2, n3 . . . or nz words
       {n+/n-}       n or more/less words
       {>n/<n}       greater/less than n words
 
(characters) "characters" may appear, too
* one or more letters and/or digits
a/b a or b
/string/ "string" is a perl-compatible regular expression
\ interpret the next character literally

       \[       \]       \(       \)       \*       \/       \"
 

Rules

a. Queries may have three parts.

1 2 3
[words] words [words]
context before core context after

b. Context parts may be multiplied.

1a 1b 2 3
[words] [words] words [words]
context before1 context before2 core context after

c. Context parts may be divided.

1 2      -     
[ [ words ] words [ words ] ] words -
/------------context before------------/ core -
/---context---/---core---/--context---/ - -

d. Queries match whole words only.

Example Seta

Observe the matches generated by each query (if any) in bold.

tyrant

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyrant(s)

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyran

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyran*

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyrann*

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyrant*

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyrant(*)

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyranny/tyrannize/tyrannical

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyrann(y)(ize)(ical)

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyrann(y)(i)(ze)(c)(al)(ide)

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

tyrannies

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

\(tyrannies\)

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

/tyrann\w{4}/

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

/tyrann*/

tyrant, tyrants, tyranny, (tyrannies), tyrannize, tyrannical, tyrannicide

Example Setb

Observe the matches generated by each query (if any) in bold.

Sethe

You your best thing, Sethe. You are.

[best] Sethe

You your best thing, Sethe. You are.

[You your]{2} Sethe

You your best thing, Sethe. You are.

best [are]{1-3}

You your best thing, Sethe. You are.

best [are]{1-2}

You your best thing, Sethe. You are.

[You]{2+} thing [You]

You your best thing, Sethe. You are.

[You]{3+} thing [You]

You your best thing, Sethe. You are.

best thing ~[Ella]

You your best thing, Sethe. You are.

[things] are

You your best thing, Sethe. You are.

~[things] are

You your best thing, Sethe. You are.

[thing(s)] are

You your best thing, Sethe. You are.

~[thing(s)] are

You your best thing, Sethe. You are.

[ [your] thing [You] ] Sethe

You your best thing, Sethe. You are.

[ [your] thing [ [best] You ] Sethe

You your best thing, Sethe. You are.

[ [your] thing [ ~[best] You ] Sethe

You your best thing, Sethe. You are.

(Y/y)ou(*)

You your best thing, Sethe. You are.

(Y/y)ou*

You your best thing, Sethe. You are.

(Y/y)ou

You your best thing, Sethe. You are.

you(*)

You your best thing, Sethe. You are.

you*

You your best thing, Sethe. You are.

you

You your best thing, Sethe. You are.

[you(*)]{<4} ~[worst/worse] You ~[/\d+/]

You your best thing, Sethe. You are.

About

A simple shell script for ordinary humans who want to write complex, perl-compatible regular expressions (PCREs).

Resources

License

Stars

Watchers

Forks

Languages