-
Notifications
You must be signed in to change notification settings - Fork 17
/
Copy pathsing.lhs
112 lines (95 loc) · 3.47 KB
/
sing.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
= The Compiler Singularity =
Our next compiler is self-hosting: it can read its 100% organic artisanal
handmade source and produce the corresponding ION assembly.
----------------------------------------------------------------
include::singularity[]
----------------------------------------------------------------
As the comment says, our parser now handles comments, whitespace, and variables
consisting of lowercase letters. We also added code to look up the index of a
top-level definition. Prefixing a character with `@` gives us direct access to
the primitive combinators. (Not to be confused with the `@` operator of ION
assembly.)
Our language is now friendly enough that we are willing to work in it
with our bare hands. However, bootstrapping to reach this singularity requires
more effort. For the sake of our previous compiler, we:
* Remove spaces and newlines.
* Remove comments.
* Rename variables like `xs` and `tab` to single letters.
* Strip the `@` tag from combinators like `@K`.
* Rewrite `foo x y =` as `\x.\y.`
* Rewrite `+\x y ->+` as `\x.\y.`
* Replace defined symbols with `@` and a character indicating where they
appeared: we refer to the nth definition with the character with ASCII code
n + 31.
Doing this by hand, while feasible, grows tiresome and error-prone if we often
make changes. Thus we automate with `sed` and `awk`, which are great tools for
the job. For example, we can generate a lookup table, where we assume an equals
sign is preceded by a space if and only if it's the equals sign of a
definition.
----------------------------------------------------------------
$ cat singularity | sed -n '/.* =/{s/ .*//;p}' | awk '{printf "@%c",NR+31; print " " $0}'
@ pair
@! just
@" pure
@# bind
@$ ap
@% fmap
@& alt
@' liftaa
@( many
@) some
@* liftki
@+ liftk
@, sat
@- char
@. lcr
@/ lcv
@0 lca
@1 lcl
@2 com
@3 sp
@4 spc
@5 spch
@6 var
@7 foldr
@8 anyone
@9 pre
@: lam
@; atom
@< apps
@= expr
@> def
@? program
@@ lsteq
@A rank
@B shows
@C isfree
@D unlam
@E babs
@F dump
@G main
----------------------------------------------------------------
Then the following script performs our list of chores.
----------------------------------------------------------------
include::bootsingularity.sh[]
----------------------------------------------------------------
Caveats:
* We assume there are at most 5 variables in a lambda. One regex
rewrites `+\a .. y z ->+` as `+\a .. y -> \z.+`, and we simply repeat this
another 4 times, followed by a regex that removes empty lambda abstractions.
* Due to lack of regex lookaround, we temporarily replace patterns like
literal spaces and newlines with underscores, which will be restored after
certain other patterns are removed. Thus we assume no underscores appear in
the original source.
* We need an extra regex to remove spaces after an escaped escape character.
This assumes our source never contains an escaped space after an escaped
escape character.
* The ampersand has a special meaning in regexes that we must escape.
* We've hardwired the variables that need renaming, along with replacements
carefully chosen to avoid conflicts with the rest of the code. We could
have written the source with single-letter variables, but then we'd miss
an opportunity to show off and test new features.
Our previous compiler should accept the result:
----------------------------------------------------------------
include::singularity.boot[]
----------------------------------------------------------------