-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathrefinement-parsers.lagda
2105 lines (1949 loc) · 108 KB
/
refinement-parsers.lagda
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[submission,copyright,creativecommons]{eptcs}
% Uncomment the following line to add the appendix on Context-free grammars:
% \def\includeCFGs{}
\providecommand{\event}{MSFP 2020}
\usepackage[style=numeric,natbib=true]{biblatex}
\addbibresource{handlers.bib}
\setcounter{biburlnumpenalty}{100}
%include agda.fmt
%include refinement-parsers.fmt
%include preamble.tex
%if style == newcode
\begin{code}
open import Prelude
open import Axiom.Extensionality.Propositional
open import Level
postulate extensionality : ∀ {l₁ l₂} -> Extensionality l₁ l₂
_⊆_ : {a : Set} -> (P Q : a -> Set) -> Set
P ⊆ Q = ∀ x -> P x -> Q x
\end{code}
%endif
\begin{document}
\title{Combining predicate transformer semantics for effects:\\a case study in parsing regular languages}
\def\titlerunning{Combining predicate transformer semantics for effects}
\def\authorrunning{Anne Baanen and Wouter Swierstra}
% Of misschien iets als:
% A predicate transformer semantics for parsing
% Verifying parser combinators using predicate transformers
\author{Anne Baanen
\institute{Vrije Universiteit Amsterdam}
\and
Wouter Swierstra
\institute{Utrecht University}
}
%\email{\{t.baanen@@vu.nl,w.s.swierstra@@uu.nl\}}}
%
\maketitle % typeset the header of the contribution
\begin{abstract}
This paper describes how to verify a parser for regular expressions in a functional programming language using predicate transformer semantics for a variety of effects.
%
Where our previous work in this area focused on the semantics for a
single effect, parsing requires a combination of effects:
non-determinism, general recursion and mutable state. Reasoning about
such combinations of effects is notoriously difficult, yet our
approach using predicate transformers enables the careful separation
of program syntax, correctness proofs and termination proofs.
\end{abstract}
\section{Introduction}
\label{sec:intro}
There is a significant body of work on parsing using combinators
in functional programming langua\-ges~\cite{list-of-successes, hutton, functional-parsers, swierstra-duponcheel, leijen2001parsec, efficient-combinator-parsers, parser-combinators-for-left-recursive, parsing-with-derivatives}, among many others.
Yet how can we ensure that these parsers are correct? There is notably
less work that attempts to answer this
question~\cite{total-parser-combinators, firsov-certification-context-free-grammars}.
Reasoning about such parser combinators is not at all trivial. They
use a variety of effects: state to store the string being parsed;
non-determinism to handle backtracking; and general recursion to deal
with recursive grammars. Proof techniques, such as equational
reasoning, that are commonly used when reasoning about pure functional programs, are less
suitable when verifying effectful programs~\cite{just-do-it,hutton2008reasoning}.
In this paper, we explore a novel approach, drawing inspiration from recent
work on algebraic effects~\cite{eff, effect-handlers-in-scope,
McBride-totally-free}. We demonstrate how to reason about all parsers
uniformly using \emph{predicate transformers}~\cite{pt-semantics-for-effects}.
We extend our previous work that uses predicate transformer semantics to reason
about a single effect, to handle the combinations of effects used by parsers.
Our semantics is modular, meaning we can introduce new effects (|Rec| in
Section \ref{sec:combinations}), semantics (|hParser| in Section \ref{sec:dmatch}) and specifications (|terminates-in| in Section \ref{sec:dmatch-correct}) when they are
needed, without having to
rework the previous definitions. In particular, our careful treatment of
general recursion lets us separate partial correctness
from the proof of termination cleanly. Most existing proofs require combinators to
guarantee that the string being parsed decreases, conflating these two issues.
In particular, the sections of this paper make the following contributions:
\begin{itemize}
\item After quickly revisiting our previous work on predicate
transformer semantics for effects (Section~\ref{sec:recap}), we show how the
non-recursive fragment of regular expressions can be correctly
parsed using non-determinism (Section \ref{sec:regex-nondet}).
\item By combining non-determinism with general recursion (Section \ref{sec:combinations}),
support for the Kleene star can be added without compromising our previous definitions.
\item Although the resulting parser is not guaranteed to terminate,
we can define another implementation using Brzozowski derivatives (Section \ref{sec:dmatch}),
introducing an additional effect and its semantics in the process.
\item Finally, we show that the derivative-based implementation terminates
and \emph{refines} the original parser (Section \ref{sec:dmatch-correct}).
\ifdefined\includeCFGs
\item Next, we show how this approach may be extended to handle
context-free languages. To do so, we show how to write parsers using
algebraic effects (Section \ref{sec:parser}), and map grammars to parsers (Section
\ref{sec:fromProductions}). Once again, we can cleanly separate the proofs of partial
correctness (Section \ref{sec:partialCorrectness}) and termination (Section \ref{sec:fromProds-terminates}).
\fi
\end{itemize}
The goal of our work is not so much the verification of a parser for regular languages,
which has been done before~\cite{harper-regex, intrinsic-verification-regex}.
Instead, we aim to illustrate the steps of incrementally developing and verifying a parser
using predicate transformers and algebraic effects.
This work is in the spirit of a Theoretical Pearl~\cite{harper-regex}:
we begin by defining a |match| function that does not terminate. The remainder of the paper
then shows how to fix this function, without having to redo the complete proof of correctness.
All the programs and proofs in this paper are written in the dependently typed language Agda~\cite{agda-thesis}.
The full source code, including lemmas we have chosen to omit for sake of readability,
is available online.\footnote{\url{https://github.com/Vierkantor/refinement-parsers}}
Apart from postulating function extensionality,
we remain entirely within Agda's default theory.
\section{Recap: algebraic effects and predicate transformers}
\label{sec:recap}
Algebraic effects separate the \emph{syntax} and \emph{semantics} of
effectful operations. In this paper, we will model them by taking the
free monad over a given signature~\cite{extensible-effects}, describing certain
operations. Signatures are represented by the type |Sig|, as follows:
\begin{code}
record Sig : Set₁ where
constructor mkSig
field
C : Set
R : C -> Set
\end{code}
This is Agda syntax for defining a type |Sig| with constructor |mkSig : (C : Set) -> (C -> Set) -> Sig| and two fields, |C : Sig -> Set| and |R : (e : Sig) -> C e -> Set|.
Here the type |C| contains the `commands', or effectful operations
that a given effect supports. For each command |c : C|, the type |R c|
describes the possible responses.
The structure on a signature is that of a \emph{container}~\cite{categories-of-containers}.
The following signature describes two commands: the
non-deterministic choice between two values, |Choice|; and a failure
operator, |Fail|. The response type |RNondet| is defined by pattern matching on the command.
If the command is |Choice|, the response is a |Bool|; the |Fail| command gives no response, indicated by the empty type |⊥|.
\begin{code}
data CNondet : Set where
Choice : CNondet
Fail : CNondet
RNondet : CNondet -> Set
RNondet Choice = Bool
RNondet Fail = ⊥
Nondet = mkSig CNondet RNondet
\end{code}
%if style == newcode
\begin{code}
module NoCombination where
open Sig
\end{code}
%endif
We represent effectful programs that use a particular effect using the
corresponding \emph{free monad}:
\begin{code}
data Free (e : Sig) (a : Set) : Set where
Pure : a -> Free e a
Op : (c : C e) -> (R e c -> Free e a) -> Free e a
\end{code}
This gives a monad, with the bind operator defined as follows.
\begin{code}
_>>=_ : (Forall(a b e)) Free e a -> (a -> Free e b) -> Free e b
Pure x >>= f = f x
Op c k >>= f = Op c (λ x -> k x >>= f)
\end{code}
%if style == newcode
\begin{code}
_<$>_ : ∀ {a b e} -> (a → b) → Free e a → Free e b
f <$> mx = mx >>= λ x → Pure (f x)
\end{code}
%endif
To facilitate programming with effects, we define the following smart
constructors, sometimes referred to as \emph{generic effects} in the
literature~\cite{algebraic-operations-and-generic-effects}:
\begin{code}
fail : (Forall(a)) Free Nondet a
fail = Op Fail (λ ())
choice : (Forall(a)) Free Nondet a -> Free Nondet a -> Free Nondet a
choice S₁ S₂ = Op Choice (λ b -> if b then S₁ else S₂)
\end{code}
The empty parentheses |()| in the definition of |fail| are Agda syntax for an argument
in an uninhabited type, hence no body for the lambda is provided.
In this paper, we will assign \emph{semantics} to effectful programs
by mapping them to \emph{predicate transformers}.
Each semantics will be computed by a fold over the free monad, mapping
some predicate |P : a -> Set| to a predicate of the entire computation of type |Free (mkSig C R) a -> Set|.
\begin{code}
wp'' : (implicit(C : Set)) (implicit(R : C -> Set)) (implicit(a : Set)) (alg : (c : C) -> (R c -> Set) -> Set) -> Free (mkSig C R) a -> (a -> Set) -> Set
(wp alg (Pure x)) P = P x
(wp alg (Op c k)) P = alg c (λ x -> (wp (alg) (k x)) P)
\end{code}
The \emph{predicate transformer} nature of these semantics
becomes evident when we assume the type of responses |R| does not depend on the command |c : C|.
The type of |alg : (c : C) → (R c → Set) → Set| then becomes |C → (R → Set) → Set|,
which is isomorphic to |(R → Set) → (C → Set)|.
Thus, |alg| has the form of a predicate transformer from postconditions of type
|R → Set| into preconditions of type |C → Set|.
% The type of |(wp' alg)| under the same isomorphism becomes
% |(a -> Set) -> (Free e a → Set)|.
Two considerations lead us to define the types as |alg : (c : C) → (R c → Set) → Set|
and |(wp' alg) : Free (mkSig C R) a → (a → Set) → Set|.
By passing the command |c : C| as first argument to |alg|, we allow |R| to depend on |c|.
Moreover, |(wp' alg)| computes semantics,
so it should take a program |S : Free (mkSig C R) a| as its argument
and return the semantics of |S|, which is then of type |(a → Set) → Set|.
In the case of non-determinism, for example, we may want to require that a given
predicate |P| holds for all possible results that may be returned:
%if style == newcode
\begin{code}
module Nondet where
\end{code}
%endif
\begin{code}
ptAll : (c : CNondet) -> (RNondet c -> Set) -> Set
ptAll Fail P = ⊤
ptAll Choice P = P True ∧ P False
\end{code}
A different semantics may instead require that |P| holds on any of the return values:
\begin{code}
ptAny : (c : CNondet) -> (RNondet c -> Set) -> Set
ptAny Fail P = ⊥
ptAny Choice P = P True ∨ P False
\end{code}
%if style == newcode
\begin{code}
module Spec where
\end{code}
%endif
Predicate transformers provide a single semantic domain to relate
programs and specifications~\cite{prog-from-spec}.
Throughout this paper, we will consider specifications consisting of a
pre- and postcondition:
\begin{samepage}
\begin{code}
record Spec (a : Set) : Set₁ where
constructor [[_,_]]
field
pre : Set
post : a -> Set
\end{code}
\end{samepage}
Inspired by work on the refinement calculus, we can assign a predicate
transformer semantics to specifications as follows:
\begin{code}
wpSpec' : (Forall(a)) Spec a -> (a -> Set) -> Set
wpSpec [[ pre , post ]] P = pre ∧ (∀ o -> post o -> P o)
\end{code}
This computes the `weakest precondition' necessary for a specification
to imply that the desired postcondition |P| holds. In particular, the
precondition |pre| should hold and any possible result satisfying the
postcondition |post| should imply the postcondition |P|.
Finally, we use the \emph{refinement relation} to compare programs and specifications:
\begin{code}
_⊑_ : (Forall(a : Set)) (pt1 pt2 : (a -> Set) -> Set) -> Set₁
pt1 ⊑ pt2 = ∀ P -> pt1 P -> pt2 P
\end{code}
Together with the predicate transformer semantics we have defined
above, this refinement relation can be used to relate programs to
their specifications. The refinement relation is both transitive and
reflexive.
%if style == newcode
\begin{code}
⊑-refl : (Forall(a)) {pt : (a -> Set) -> Set} -> pt ⊑ pt
⊑-refl P x = x
\end{code}
%endif
\section{Regular languages without recursion} \label{sec:regex-nondet}
%if style == newcode
\begin{code}
open import Data.Char using (Char; _≟_)
String = List Char
\end{code}
%endif
To illustrate how to reason about non-deterministic code, we will
define and verify a regular expression matcher. Initially, we will restrict
ourselves to non-recursive regular expressions; we will add recursion
in the next section.
We begin by defining the |Regex| datatype for regular expressions.
An element of this type represents the syntax of a regular expression.
% Introduce formatting for the * operator later, to avoid confusion with multiplication.
%if style == poly
%format _* = "\_\!\mathbin{\star}"
%format * = "\mathbin{\star}"
%endif
\begin{code}
data Regex : Set where
Empty : Regex
Epsilon : Regex
Singleton : Char -> Regex
_∣_ : Regex -> Regex -> Regex
_·_ : Regex -> Regex -> Regex
_* : Regex -> Regex
\end{code}
The |Empty| regular expression corresponds to the empty
language, while the |Epsilon| expression only matches the empty
string. Furthermore, our language for regular expressions is closed
under choice (|_∣_|), concatenation (|_·_|) and linear repetition
denoted by the Kleene star (|_*|).
The input to the regular expression matcher will be a |String| together with a |Regex| denoting the language to match the string against.
What should our matcher return? A Boolean value is
not particularly informative; yet we also choose not to provide an
intrinsically correct definition, instead performing extrinsic
verification using our predicate transformer semantics. The |tree|
data type below captures a potential parse tree associated with a
given regular expression:
\begin{code}
tree : Regex -> Set
tree Empty = ⊥
tree Epsilon = ⊤
tree (Singleton _) = Char
tree (l ∣ r) = Either (tree l) (tree r)
tree (l · r) = Pair (tree l) (tree r)
tree (r *) = List (tree r)
\end{code}
In the remainder of this section, we will develop a regular expression
matcher with the following type:
\begin{spec}
match : (r : Regex) (xs : String) -> Free Nondet (tree r)
\end{spec}
Before we do so, however, we will complete our specification. Although
the type above guarantees that we return a parse tree matching the
regular expression |r|, there is no relation between the tree and the
input string. To capture this relation, we define the following
|Match| data type. A value of type |Match r xs t| states that the
string |xs| is in the language given by the regular expression |r| as
witnessed by the parse tree |t|:
\begin{code}
data Match : (r : Regex) -> String -> tree r -> Set where
Epsilon : Match Epsilon Nil tt
Singleton : (implicit(x : Char)) Match (Singleton x) (x :: Nil) x
OrLeft : (implicit(l r : Regex)) (implicit(xs : String)) (implicit(x : tree l)) Match l xs x -> Match (l ∣ r) xs (Inl x)
OrRight : (implicit(l r : Regex)) (implicit(xs : String)) (implicit(x : tree r)) Match r xs x -> Match (l ∣ r) xs (Inr x)
Concat : (implicit(l r : Regex)) (implicit(ys zs : String)) (implicit(y : tree l)) (implicit(z : tree r)) Match l ys y -> Match r zs z -> Match (l · r) (ys ++ zs) (y , z)
StarNil : (implicit(r : Regex)) Match (r *) Nil Nil
StarConcat : (implicit(r : Regex)) (implicit(xs : String)) (implicit(y : tree r)) (implicit(ys : List (tree r))) Match (r · (r *)) xs (y , ys) -> Match (r *) xs (y :: ys)
\end{code}
Note that there is no constructor for |Match Empty xs ms| for any |xs|
or |ms|, as there is no way to match the |Empty| language with a
string |xs|. Similarly, the only constructor for |Match Epsilon xs
ms| is where |xs| is the empty string |Nil|. There are two
constructors that produce a |Match| for a regular expression of the
form |l ∣ r|, corresponding to the choice of matching either |l| or
|r|.
The cases for concatenation and iteration are more
interesting. Crucially the |Concat| constructor constructs a match on
the concatenation of the strings |ys| and |zs| -- although there may
be many possible ways to decompose a string into two
substrings. Finally, the two constructors for the Kleene star, |r *|,
match zero (|StarNil|) or many (|StarConcat|) repetitions of |r|.
We will now turn our attention to the |match| function. The complete
definition, by induction on the argument regular expression, can be
found in Figure~\ref{fig:match}. Most of the cases are
straightforward---the most difficult case is that for concatenation,
where we non-deterministically consider all possible splittings of the
input string |xs| into a pair of strings |ys| and |zs|. The
|allSplits| function, defined below, computes all possible splittings:
%if style == newcode
\begin{code}
module AlmostRegex where
open NoCombination
open Nondet
open Spec
\end{code}
%endif
\begin{code}
allSplits : (Forall(a)) (xs : List a) -> Free Nondet (List a × List a)
allSplits Nil = Pure (Nil , Nil)
allSplits (x :: xs) = choice
(Pure (Nil , (x :: xs)))
(allSplits xs >>= λ {(ys , zs) → Pure ((x :: ys) , zs)})
\end{code}
\begin{figure}
\begin{code}
match : (r : Regex) (xs : String) -> Free Nondet (tree r)
match Empty xs = fail
match Epsilon Nil = Pure tt
match Epsilon (_ :: _) = fail
match (Singleton c) Nil = fail
match (Singleton c) (x :: Nil) with c ≟ x
match (Singleton c) (c :: Nil) | yes refl = Pure c
match (Singleton c) (x :: Nil) | no ¬p = fail
match (Singleton c) (_ :: _ :: _) = fail
match (l ∣ r) xs = choice (Inl <$> match l xs) (Inr <$> match r xs)
match (l · r) xs = do (ys , zs) <- allSplits xs
y <- match l ys
z <- match r zs
Pure (y , z)
match (r *) xs = match (r · (r *)) xs
\end{code}
\caption{The definition of the |match| function}
\label{fig:match}
\end{figure}
Finally, we cannot yet implement the case for the Kleene star. We could
attempt to mimic the case for concatenation, attempting to match |r · (r *)|.
This definition, however, is rejected by Agda as it is not structurally
recursive. For now we choose to simply fail on all such regular expressions.
In Section \ref{sec:combinations} we will fix this issue, after introducing
the auxiliary definitions.
Still, we can prove that the |match| function behaves correctly on all
regular expressions that do not contain iteration. We introduce a |hasNo*|
predicate, which holds of all such iteration-free regular expressions:
\begin{code}
hasNo* : Regex -> Set
\end{code}
%if style == newcode
\begin{code}
hasNo* Empty = ⊤
hasNo* Epsilon = ⊤
hasNo* (Singleton x) = ⊤
hasNo* (l · r) = hasNo* l ∧ hasNo* r
hasNo* (l ∣ r) = hasNo* l ∧ hasNo* r
hasNo* (r *) = ⊥
\end{code}
%endif
To verify our matcher is correct, we need to prove that it satisfies
the specification consisting of the following pre- and postcondition:
\begin{code}
pre : (r : Regex) (xs : String) -> Set
pre r xs = hasNo* r
post : (r : Regex) (xs : String) -> tree r -> Set
post = Match
\end{code}
The main correctness result can now be formulated as follows:
\begin{code}
matchSound : ∀ r xs -> (wpSpec [[ (pre r xs) , (post r xs) ]]) ⊑ (wp ptAll (match r xs))
\end{code}
This lemma guarantees that all the parse trees computed by the |match|
function satisfy the |Match| relation, provided the input regular
expression does not contain iteration. The proof goes by induction on the
regular expression |r|. Although we have omitted the proof, we will sketch the
key lemmas and definitions that are necessary to complete it.
In most of the cases for |r|, the definition of |match r| is uncomplicated and
the proof is similarly simple. As soon as we need to reason about programs
composed using the monadic bind operator, we quickly run into issues. In
particular, when verifying the case for |l · r|, we would like to use
our induction hypotheses on two recursive calls. To do so, we prove the
following lemma that allows us to replace the semantics of a composite
program built using the monadic bind operation with the composition of
the underlying predicate transformers:
\begin{code}
consequence : (Forall(a b es P)) ∀ pt (mx : Free es a) (f : a -> Free es b) ->
(wp pt mx) (λ x -> (wp pt (f x)) P) == (wp pt (mx >>= f)) P
\end{code}
%if style == newcode
\begin{code}
consequence pt (Pure x) f = refl
consequence pt (Op c k) f = cong (pt c)
(extensionality λ x -> consequence pt (k x) f)
\end{code}
%endif
Substituting along this equality gives us the lemmas we need to deal with the |_>>=_| operator:
\begin{code}
wpToBind : (Forall (a b es pt P)) (mx : Free es a) (f : a -> Free es b) ->
(wp pt mx) (λ x -> (wp pt (f x)) P) -> (wp pt (mx >>= f)) P
wpFromBind : (Forall (a b es pt P)) (mx : Free es a) (f : a -> Free es b) ->
(wp pt (mx >>= f)) P -> (wp pt mx) (λ x -> (wp pt (f x)) P)
\end{code}
%if style == newcode
\begin{code}
wpToBind (hidden(pt = pt)) mx f H = subst id (consequence pt mx f) H
wpFromBind (hidden(pt = pt)) mx f H = subst id (sym (consequence pt mx f)) H
\end{code}
%endif
Not only does |match (l · r)| result in two recursive calls,
it also makes a call to a helper function |allSplits|.
Thus, we also need to formulate and prove the correctness of that function, as follows:
\begin{code}
allSplitsPost : String → String × String → Set
allSplitsPost xs (ys , zs) = xs == ys ++ zs
allSplitsSound : ∀ xs -> (wpSpec [[ ⊤ , (allSplitsPost xs) ]]) ⊑ (wp ptAll (allSplits xs))
\end{code}
Using |wpToBind|, we can incorporate the correctness proof of |allSplits|
in the correctness proof of |match|.
We refer to the accompanying code for the complete details of these
proofs.
%if style == newcode
\begin{code}
allSplitsSound Nil P (preH , postH) = postH _ refl
allSplitsSound (x :: xs) P (preH , postH) = postH _ refl ,
wpToBind (allSplits xs) _ (allSplitsSound xs _ (tt ,
λ _ H → postH _ (cong (x ::_) H)))
matchSound Empty xs P (preH , postH) = tt
matchSound Epsilon Nil P (preH , postH) = postH _ Epsilon
matchSound Epsilon (x :: xs) P (preH , postH) = tt
matchSound (Singleton x) Nil P (preH , postH) = tt
matchSound (Singleton x) (c :: Nil) P (preH , postH) with x ≟ c
... | yes refl = postH _ Singleton
... | no ¬p = tt
matchSound (Singleton x) (_ :: _ :: _) P (preH , postH) = tt
matchSound (l · r) xs P ((preL , preR) , postH) =
wpToBind (allSplits xs) _ (allSplitsSound xs _ (tt ,
λ {(ys , zs) splitH → wpToBind (match l ys) _ (matchSound l ys _ (preL ,
λ y lH → wpToBind (match r zs) _ ((matchSound r zs _ (preR ,
λ z rH → postH (y , z) (subst (λ xs → Match _ xs _) (sym splitH)
(Concat lH rH)))))))}))
matchSound (l ∣ r) xs P ((preL , preR) , postH) =
wpToBind (match l xs) _ (matchSound l xs _ (preL ,
λ _ lH → postH _ (OrLeft lH))) ,
wpToBind (match r xs) _ (matchSound r xs _ (preR ,
λ _ rH → postH _ (OrRight rH)))
matchSound (r *) xs P (() , postH)
\end{code}
%endif
\section{General recursion and non-determinism} \label{sec:combinations}
The matcher we have defined in the previous section is incomplete,
since it fails to handle regular expressions that use the Kleene star.
The fundamental issue is that the Kleene star allows for arbitrarily many matches in certain cases, that
in turn, leads to problems with Agda's termination checker.
For example, matching |Epsilon *| with the empty string |""| may unfold the Kleene star infinitely often
without ever terminating.
As a result, we cannot implement |match| for the Kleene star using recursion directly.
Instead, we will deal with this (possibly unbounded) recursion by introducing a new \emph{effect}.
We will represent a recursively defined dependent function of type |(i : I) -> O i|
as an element of the type |(i : I) -> Free (Rec I O) (O i)|.
Here |Rec I O| is a synonym of the the signature type we saw previously~\cite{McBride-totally-free}:
\begin{code}
Rec : (I : Set) (O : I -> Set) -> Sig
Rec I O = mkSig I O
\end{code}
Intuitively, you may want to think of values of type |(i : I) -> Free (Rec I O)
(O i)| as computing a (finite) call graph for every input |i : I|. Instead of
recurring directly, the `effects' that this signature supports require an input
|i : I| corresponding to the argument of the recursive call; the continuation
abstracts over a value of type |O i|, corresponding to the result of a
recursive call. Note that the functions defined in this style are \emph{not}
recursive; instead we will need to write handlers to unfold the function
definition or prove termination separately. A handler for the |Rec| effect,
under the intended semantics, thus behaves like a fixed-point combinator,
introducing recursion to an otherwise recursion-free language by
substituting the function body in each recursive call.
We cannot, however, define a |match| function of the form |Free (Rec _
_)| directly, as our previous definition also used non-determinism. To
account for both non-determinism and unbounded recursion, we need a
way to combine effects. Fortunately, free monads are known to be
closed under coproducts; there is a substantial body of work that
exploits this to (syntactically) compose separate
effects~\cite{effect-handlers-in-scope, la-carte}.
% We can combine two effects in a straightforward way: given signatures |mkSig C₁ R₁| and |mkSig C₂ R₂|,
% we can define a combined signature by taking the disjoint union of the commands and responses,
% resulting in |mkSig (Either C₁ C₂) [ R₁ , R₂ ]|,
% where |[ R₁ , R₂ ]| is the unique map given by applying |R₁| to values in |C₁| and |R₂| on |C₂|~\cite{effect-handlers-in-scope}.
% If we want to support more effects, we can repeat this process of disjoint unions,
% but this quickly becomes somewhat cumbersome.
% For example, the disjoint union construction is associative semantically, but not syntactically.
% If two programs have the same set of effects that is associated differently, we cannot directly compose them.
Rather than restrict ourselves to the binary composition using
coproducts, we modify the |Free| monad to take a \emph{list} of
signatures as its argument, taking the coproduct of the elements of
the list as its signature functor. The |Pure| constructor remains
unchanged, while the |Op| constructor additionally takes an index into the
list to specify the effect that is invoked.
%if style == newcode
\begin{code}
module Combinations where
open Sig
open AlmostRegex using (allSplitsPost)
\end{code}
%endif
\begin{code}
data Free (es : List Sig) (a : Set) : Set₁ where
Pure : a -> Free es a
Op : (hidden(e : Sig)) (i : e ∈ es) (c : C e) (k : R e c -> Free es a) -> Free es a
\end{code}
By using a list of effects instead of allowing arbitrary disjoint unions,
we have effectively chosen that the disjoint unions canonically associate to the right.
% Since the disjoint union is also commutative,
% it would be cleaner to have the collection of effects be unordered as well.
% Unfortunately, working with such multisets in Agda is type that is easy to work with.
We choose to use the same names and (almost) the same syntax for this new definition of |Free|,
since all the definitions that we have seen previously can be readily adapted to work with this data type instead.
%if style == newcode
\begin{code}
_>>=_ : ∀ {a b es} -> Free es a -> (a -> Free es b) -> Free es b
Pure x >>= f = f x
Op i c k >>= f = Op i c λ x -> k x >>= f
_>>_ : ∀ {a b es} → Free es a → Free es b → Free es b
mx >> my = mx >>= const my
>>=-assoc : ∀ {a b c es} (S : Free es a) (f : a -> Free es b) (g : b -> Free es c) ->
(S >>= f) >>= g == S >>= (λ x -> f x >>= g)
>>=-assoc (Pure x) f g = refl
>>=-assoc (Op i c k) f g = cong (Op i c) (extensionality λ x -> >>=-assoc (k x) _ _)
>>=-Pure : ∀ {a es} (S : Free es a) ->
S >>= Pure == S
>>=-Pure (Pure x) = refl
>>=-Pure (Op i c k) = cong (Op i c) (extensionality λ x -> >>=-Pure (k x))
_<$>_ : ∀ {a b es} -> (a → b) → Free es a → Free es b
f <$> mx = mx >>= λ x → Pure (f x)
\end{code}
%endif
Most of this bookkeeping involved with different effects can be inferred using Agda's \emph{instance arguments}~\cite{instance-arguments-agda}.
Instance arguments, marked using the double curly braces |⦃ ^^ ⦄|, are
automatically filled in by Agda, provided a unique value of the
required type can be found. For example, we can define the generic
effects that we saw previously as follows:
\begin{code}
fail : (Forall(a es)) ⦃ iND : Nondet ∈ es ⦄ -> Free es a
fail ⦃ iND ⦄ = Op iND Fail (λ ())
choice : (Forall(a es)) ⦃ iND : Nondet ∈ es ⦄ -> Free es a -> Free es a -> Free es a
choice ⦃ iND ⦄ S₁ S₂ = Op iND Choice (λ b -> if b then S₁ else S₂)
call : (Forall(I O es)) ⦃ iRec : Rec I O ∈ es ⦄ -> (i : I) -> Free es (O i)
call ⦃ iRec ⦄ i = Op iRec i Pure
\end{code}
These now operate over any free monad with effects given by |es|,
provided we can show that the list |es| contains the |Nondet| and
|Rec| effects respectively.
%if style == newcode
\begin{code}
choices : ∀ {es a} ⦃ iND : Nondet ∈ es ⦄ → List (Free es a) → Free es a
choices ⦃ iND ⦄ = foldr choice fail
\end{code}
%endif
For convenience of notation, we introduce the |(RecArr _ es _)| notation for the type of generally recursive functions with effects in |es|,
i.e. Kleisli arrows into |Free (Rec _ _ :: es)|.
\begin{code}
RecArr' : (I : Set) (es : List Sig) (O : I → Set) → Set₁
(RecArr I es O) = (i : I) -> Free (Rec I O :: es) (O i)
\end{code}
%if style == newcode
\begin{code}
instance
inHead : ∀ {l} {a : Set l} {x : a} {xs : List a} → x ∈ (x :: xs)
inHead = ∈Head
inTail : ∀ {l} {a : Set l} {x x' : a} {xs : List a} → ⦃ i : x ∈ xs ⦄ → x ∈ (x' :: xs)
inTail ⦃ i ⦄ = ∈Tail i
\end{code}
%endif
With the syntax for combinations of effects defined, let us turn to semantics.
Since the weakest precondition predicate transformer for a single effect is given as a fold
over the effect's signature,
the semantics for a combination of effects can be given by a list of such semantics.
%if style == newcode
\begin{code}
module Stateless where
open Combinations
open Sig
open Spec
open AlmostRegex using (allSplitsPost)
\end{code}
%endif
\begin{code}
record PT (e : Sig) : Set₁ where
constructor mkPT
field
pt : (c : C e) → (R e c → Set) → Set
mono : ∀ c P P' → (∀ x -> P x -> P' x) → pt c P → pt c P'
\end{code}
\begin{samepage}
\begin{code}
data PTs : List Sig -> Set₁ where
Nil : PTs Nil
_::_ : (Forall(e es)) PT e -> PTs es -> PTs (e :: es)
\end{code}
\end{samepage}
The record type |PT| not only contains a predicate transformer |pt|,
but also a proof that this predicate transformer is
\emph{monotone}. Several lemmas throughout this paper, such as the
|terminates-fmap| lemma of Section~\ref{sec:dmatch-correct}, rely on the monotonicity of the
underlying predicate transformers;
for each semantics we present, the proof of monotonicity is immediate.
Given such a list of predicate transformers,
defining the semantics of an effectful program is a straightforward generalization of the previously defined semantics.
The |Pure| case is identical, and in the |Op| case we can apply the predicate transformer returned by the |lookupPT| helper function.
\begin{code}
lookupPT : (Forall(C R es)) (pts : PTs es) (i : mkSig C R ∈ es) -> (c : C) -> (R c -> Set) -> Set
lookupPT (pt :: pts) ∈Head = PT.pt pt
lookupPT (pt :: pts) (∈Tail i) = lookupPT pts i
\end{code}
%if style == newcode
\begin{code}
lookupMono : ∀ {C R es} (pts : PTs es) (i : mkSig C R ∈ es) -> ∀ c P P' → P ⊆ P' → lookupPT pts i c P → lookupPT pts i c P'
lookupMono (pt :: pts) ∈Head = PT.mono pt
lookupMono (pt :: pts) (∈Tail i) = lookupMono pts i
\end{code}
%endif
This results in the following definition of the semantics for combinations of effects.
\begin{code}
wp'' : (Forall(a es)) (pts : PTs es) -> Free es a -> (a -> Set) -> Set
(wp pts (Pure x)) P = P x
(wp pts (Op i c k)) P = lookupPT pts i c (λ x -> (wp pts (k x)) P)
\end{code}
The effects that we will use for our |match| function consist of a
combination of non-determinism and general recursion. Although we can
reuse the |ptAll| semantics of non-determinism, we have not yet given
the semantics for recursion. However, it is not as easy to give a
predicate transformer semantics for recursion in general, since the
intended semantics of a recursive call depend on the function that is
being defined. Instead, to give semantics to a recursive function, we
assume that we have been provided with a relation of the type |(i : I)
-> O i -> Set|, reminiscent of a loop invariant in an imperative
program. The semantics then establishes whether or not the recursive
function adheres to this invariant or not:
\begin{code}
ptRec : (Forall(I O)) ((i : I) -> O i -> Set) -> PT (Rec I O)
PT.pt (ptRec R) i P = ∀ o -> R i o -> P o
\end{code}
%if style == newcode
\begin{code}
PT.mono (ptRec R) c P P' imp asm o h = imp _ (asm _ h)
\end{code}
%endif
As we shall see shortly, when revisiting the |match| function, the
|Match| relation defined previously will fulfill the role of this
`invariant.'
% In the case of verifying the |match| function, the |Match| relation will play the role of |R|.
% If we use |ptRec R| as a predicate transformer to check that a recursive function satisfies the relation |R|,
% then we are proving \emph{partial correctness}, since we assume each recursive call successfully returns
% a correct value according to the relation |R|.
To deal with the Kleene star, we rewrite |match| as a generally recursive function using a combination of effects.
Since |match| makes use of |allSplits|, we also rewrite that function to use a combination of effects.
The types become:
\begin{code}
allSplits : (Forall(a es)) ⦃ iND : Nondet ∈ es ⦄ -> List a -> Free es (List a × List a)
match : (Forall(es)) ⦃ iND : Nondet ∈ es ⦄ → (RecArrBinding (x) (Regex × String) es (tree (Pair.fst x)))
\end{code}
%if style == newcode
\begin{code}
allSplits Nil = Pure (Nil , Nil)
allSplits (x :: xs) = choice
(Pure (Nil , (x :: xs)))
(allSplits xs >>= λ {(ys , zs) → Pure ((x :: ys) , zs)})
match (Empty , xs) = fail
match (Epsilon , Nil) = Pure tt
match (Epsilon , xs@(_ :: _)) = fail
match ((Singleton c) , Nil) = fail
match ((Singleton c) , (x :: Nil)) with c ≟ x
match ((Singleton c) , (.c :: Nil)) | yes refl = Pure c
match ((Singleton c) , (x :: Nil)) | no ¬p = fail
match ((Singleton c) , (_ :: _ :: _)) = fail
match ((l · r) , xs) = do
(ys , zs) <- allSplits xs
y <- call (hiddenInstance(∈Head)) (l , ys)
z <- call (hiddenInstance(∈Head)) (r , zs)
Pure (y , z)
match ((l ∣ r) , xs) = choice
(Inl <$> call (hiddenInstance(∈Head)) (l , xs))
(Inr <$> call (hiddenInstance(∈Head)) (r , xs))
\end{code}
%endif
Since the index argument to the smart constructor is inferred by Agda,
the only change in the definition of |match| and |allSplits| will be
that |match| now does have a meaningful branch for the Kleene star case:
\begin{samepage}
\begin{code}
match ((r *) , Nil) = Pure Nil
match ((r *) , xs@(_ :: _)) = do
(y , ys) <- call (hiddenInstance(∈Head)) ((r · (r *)) , xs)
Pure (y :: ys)
\end{code}
\end{samepage}
The effects we need to use for running |match| are a combination of non-determinism and general recursion.
As discussed, we first need to give the specification for |match| before we can verify a program that performs a recursive |call| to |match|.
%if style == newcode
\begin{code}
ptAll : PT Nondet
PT.pt ptAll Fail P = ⊤
PT.pt ptAll Choice P = P True ∧ P False
PT.mono ptAll Fail P P' imp asm = asm
PT.mono ptAll Choice P P' imp (fst , snd) = imp _ fst , imp _ snd
ptAny : PT Nondet
PT.pt ptAny Fail P = ⊥
PT.pt ptAny Choice P = P True ∨ P False
PT.mono ptAny Fail P P' imp asm = asm
PT.mono ptAny Choice P P' imp (Inl asm) = Inl (imp _ asm)
PT.mono ptAny Choice P P' imp (Inr asm) = Inr (imp _ asm)
\end{code}
%endif
\begin{code}
matchSpec : (r,xs : Pair Regex String) -> tree (Pair.fst r,xs) -> Set
matchSpec (r , xs) ms = Match r xs ms
wpMatch' : (Forall(a)) Free (Rec (Pair Regex String) (tree ∘ Pair.fst) :: Nondet :: Nil) a ->
(a -> Set) -> Set
wpMatch S = (wp (ptRec matchSpec :: ptAll :: Nil) S)
\end{code}
%if style == newcode
\begin{code}
consequence : ∀ {a b es P} pts (mx : Free es a) (f : a -> Free es b) ->
wp pts mx (λ x -> wp pts (f x) P) == wp pts (mx >>= f) P
consequence pts (Pure x) f = refl
consequence pts (Op i c k) f = cong (lookupPT pts i c) (extensionality λ x -> consequence pts (k x) f)
wpToBind : ∀ {a b es pts P} (mx : Free es a) (f : a -> Free es b) ->
wp pts mx (λ x -> wp pts (f x) P) -> wp pts (mx >>= f) P
wpToBind {pts = pts} mx f H = subst id (consequence pts mx f) H
wpFromBind : ∀ {a b es pts P} (mx : Free es a) (f : a -> Free es b) ->
wp pts (mx >>= f) P -> wp pts mx (λ x -> wp pts (f x) P)
wpFromBind {pts = pts} mx f H = subst id (sym (consequence pts mx f)) H
\end{code}
%endif
We can reuse exactly our proof that |allSplits| is correct,
since we use the same semantics for the non-determinism used in the definition of |allSplits|.
Similarly, the partial correctness proof of |match| will be the same on all cases except the Kleene star.
%if style == newcode
\begin{code}
allSplitsSound : ∀ (xs : String) ->
(wpSpec [[ ⊤ , (λ {(ys , zs) → xs == ys ++ zs})]]) ⊑ (wpMatch (allSplits (hiddenInstance(∈Tail ∈Head)) xs))
allSplitsSound Nil P (fst , snd) = snd _ refl
allSplitsSound (x :: xs) P (fst , snd) = snd _ refl ,
wpToBind (allSplits xs) _ (allSplitsSound xs _ (tt , λ {(ys , zs) H → snd _ (cong (x ::_) H)}))
\end{code}
On the other hand, the correctness proof for |match| needs a bit of tweaking to deal with the difference in the recursive steps.
\begin{code}
matchSound : ∀ r,xs -> (wpSpec [[ ⊤ , matchSpec r,xs ]]) ⊑ wpMatch (match (hiddenInstance(∈Head)) r,xs)
matchSound (Empty , xs) P (preH , postH) = tt
matchSound (Epsilon , Nil) P (preH , postH) = postH _ Epsilon
matchSound (Epsilon , (_ :: _)) P (preH , postH) = tt
matchSound (Singleton c , Nil) P (preH , postH) = tt
matchSound (Singleton c , (x :: Nil)) P (preH , postH) with c ≟ x
matchSound (Singleton c , (.c :: Nil)) P (preH , postH) | yes refl = postH _ Singleton
matchSound (Singleton c , (x :: Nil)) P (preH , postH) | no ¬p = tt
matchSound (Singleton c , (_ :: _ :: _)) P (preH , postH) = tt
matchSound ((l · r) , xs) P (preH , postH) = wpToBind (allSplits xs) _
(allSplitsSound xs _ (_ , (λ {(ys , zs) splitH y lH z rH → postH (y , z)
(coerce (cong (λ xs → Match _ xs _) (sym splitH)) (Concat lH rH))})))
matchSound ((l ∣ r) , xs) P (preH , postH) =
(λ o H -> postH _ (OrLeft H)) ,
(λ o H -> postH _ (OrRight H))
\end{code}
%endif
Now we are able to prove correctness of |match| on a Kleene star.
\begin{code}
matchSound ((r *) , Nil) P (preH , postH) = postH _ StarNil
matchSound ((r *) , (x :: xs)) P (preH , postH) o H = postH _ (StarConcat H)
\end{code}
At this point, we have defined a matcher for regular languages and
formally proven that when it succeeds in recognizing a given string,
this string is indeed in the language generated by the argument
regular expression. However, the |match| function does not
necessarily terminate: if |r| is a regular expression that accepts the
empty string, then calling |match| on |r *| and a string |xs| will
diverge. In the next section, we will write a new parser that is guaranteed to
terminate and show that this parser refines the |match| function
defined above.
\section{Derivatives and handlers} \label{sec:dmatch}
Since recursion on the structure of a regular expression does not guarantee
termination of the parser, we can instead perform recursion on the string to be
parsed, changing the regular expression to be matched based on the characters
we have seen.
The \emph{Brzozowski derivative} of a formal language |L| with respect to a character |x| consists of all strings |xs| such that |x :: xs ∈ L|~\cite{Brzozowski}.
Crucially, if |L| is regular, so are all its derivatives.
Thus, let |r| be a regular expression, and |d r /d x| an expression for the derivative with respect to |x|,
then |r| matches a string |x :: xs| if and only if |d r /d x| matches |xs|.
This suggests the following implementation of matching an expression |r| with a string |xs|:
if |xs| is empty, check whether |r| matches the empty string;
otherwise remove the head |x| of the string and try to match |d r /d x|.
The first step in implementing a parser using the Brzozowski derivative is to compute the derivative for a given regular expression.
Following \citet{Brzozowski}, we use a helper function |matchEpsilon| that decides whether an expression matches the empty string.
\begin{code}
matchEpsilon : (r : Regex) -> Dec (Sigma (tree r) (Match r Nil))
\end{code}
%if style == newcode
\begin{code}
nilConcat : (Forall(l r xs x,y)) Match (l · r) xs x,y -> xs == Nil -> Pair (Sigma (tree l) (Match l Nil)) (Sigma (tree r) (Match r Nil))
nilConcat (Concat {ys = Nil} {Nil} y z) cat = (_ , y) , (_ , z)
nilConcat (Concat {ys = Nil} {x :: zs} ml mr) ()
nilConcat (Concat {ys = x :: ys} {zs} ml mr) ()
matchEpsilon Empty = no (λ ())
matchEpsilon Epsilon = yes (tt , Epsilon)
matchEpsilon (Singleton x) = no (λ ())
matchEpsilon (l · r) with matchEpsilon l | matchEpsilon r
matchEpsilon (l · r) | yes (ml , pl) | yes (mr , pr) = yes ((ml , mr) , (Concat pl pr))
matchEpsilon (l · r) | yes pl | no ¬pr = no λ {(m , x) -> ¬pr (Pair.snd (nilConcat x refl))}
matchEpsilon (l · r) | no ¬pl | yes pr = no λ {(m , x) -> ¬pl (Pair.fst (nilConcat x refl))}
matchEpsilon (l · r) | no ¬pl | no ¬pr = no λ {(m , x) -> ¬pl (Pair.fst (nilConcat x refl))}
matchEpsilon (l ∣ r) with matchEpsilon l | matchEpsilon r
matchEpsilon (l ∣ r) | yes (ml , pl) | yes pr = yes (Inl ml , OrLeft pl)
matchEpsilon (l ∣ r) | yes (ml , pl) | no ¬pr = yes (Inl ml , OrLeft pl)
matchEpsilon (l ∣ r) | no ¬pl | yes (mr , pr) = yes (Inr mr , OrRight pr)
matchEpsilon (l ∣ r) | no ¬pl | no ¬pr = no λ {(Inl ml , OrLeft pl) -> ¬pl (ml , pl) ; (Inr mr , OrRight pr) -> ¬pr (mr , pr)}
matchEpsilon (r *) = yes (Nil , StarNil)
\end{code}
%endif
The definition of |matchEpsilon| is given by structural recursion on the regular expression, just as the derivative operator is:
\begin{samepage}
\begin{code}
d_/d_ : Regex -> Char -> Regex
d Empty /d c = Empty
d Epsilon /d c = Empty
d Singleton x /d c with c ≟ x
... | yes p = Epsilon
... | no ¬p = Empty
d l · r /d c with matchEpsilon l
... | yes p = ((d l /d c) · r) ∣ (d r /d c)
... | no ¬p = (d l /d c) · r
d l ∣ r /d c = (d l /d c) ∣ (d r /d c)
d r * /d c = (d r /d c) · (r *)
\end{code}
\end{samepage}
To use the derivative of |r| to compute a parse tree for |r|,
we need to be able to convert a tree for |d r /d x| to a tree for |r|.
As this function `inverts' the result of differentiation, we name it |integralTree|:
\begin{code}
integralTree : (implicit(x : Char)) (r : Regex) -> tree (d r /d x) → tree r
\end{code}
Its definition closely follows the pattern matching performed in the definition of |d_/d_|.
%if style == newcode
\begin{code}
integralTree Empty ()
integralTree Epsilon ()
integralTree (Singleton x) t = x
integralTree (l ∣ r) (Inl x) = Inl (integralTree l x)
integralTree (l ∣ r) (Inr x) = Inr (integralTree r x)
integralTree (l · r) t with matchEpsilon l
integralTree (l · r) (Inl (fst , snd)) | yes p = integralTree l fst , snd
integralTree (l · r) (Inr x) | yes (y , m) = y , integralTree r x
integralTree (l · r) (fst , snd) | no ¬p = integralTree l fst , snd
integralTree (r *) (fst , snd) = integralTree r fst :: snd
\end{code}
%endif
The description of a derivative-based matcher is stateful:
we perform a step by \emph{removing} a character from the input string.
To match the description, we introduce new effect |Parser| which provides a parser-specific interface to this state.
The |Parser| effect has one command |Symbol| that returns a |Maybe Char|.
Calling |Symbol| will return |just| the head of the unparsed remainder (advancing the string by one character) or |nothing| if the string has been totally consumed.
\begin{code}
data CParser : Set where
Symbol : CParser
RParser : CParser -> Set
RParser Symbol = Maybe Char
Parser = mkSig CParser RParser
symbol : (Forall(es)) ⦃ iP : Parser ∈ es ⦄ -> Free es (Maybe Char)
symbol ⦃ iP ⦄ = Op iP Symbol Pure
\end{code}
The code for the new parser, |dmatch|, is now only a few lines long. When the
input contains at least one character, we use the derivative to match the first
character and recurse; when the input string is empty, we check that the
expression matches the empty string.
%if style == newcode
\begin{code}
-- Dependent If-Then-Else syntax inspired by Lean
dite : ∀ {l k} {P : Set l} {a : Set k} → Dec P → (P → a) → (¬ P → a) → a
dite P t f = if' P then t else f
syntax dite P (λ p → t) f = if p <- P then t else f
\end{code}
%endif
\begin{code}
dmatch : (Forall(es)) ⦃ iP : Parser ∈ es ⦄ ⦃ iND : Nondet ∈ es ⦄ -> (RecArr Regex es tree)
dmatch r = symbol >>= maybe
(λ x -> integralTree r <$> call (hiddenInstance(∈Head)) (d r /d x))
(if p <- matchEpsilon r then Pure (Sigma.fst p) else (hiddenConst(fail)))
\end{code}
Here, |maybe f y| takes a |Maybe| value and applies |f| to the value in |just|, or returns |y| if it is |nothing|.