-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdoc.texi
1785 lines (1302 loc) · 73 KB
/
doc.texi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\input texinfo
@settitle JRM's Syntax-rules Primer for the Merely Eccentric
@dircategory The Algorithmic Language Scheme
@direntry
* joe-marshall-syntax-rules-primer: Joe R Marshall's Syntax-Rules Primer for the Merely Eccentric
@end direntry
@author Joe Marshall
@documentencoding UTF-8
@node Top
@ifinfo
@heading Joe R Marshall's Syntax-Rules Primer for the Merely Eccentric
@noindent
Unofficial Texinfo Format version.
@end ifinfo
@node Foreword, Very simple macros, Top
In learning to write Scheme macros, I have noticed that it is easy to find both trivial examples and extraordinarily complex examples, but there seem to be no intermediate ones. I have discovered a few tricks in writing macros and perhaps some people will find them helpful.
The basic purpose of a macro is *syntactic* abstraction. As functions allow you to extend the functionality of the underlying Scheme language, macros allow you to extend the syntax. A well designed macro can greatly increase the readability of a program, but a poorly designed one can make a program completely unreadable.
Macros are also often used as a substitute for functions to improve performance. In an ideal world this would be unnecessary, but compilers have limitations and a macro can often provide a workaround. In these cases, the macro should be a `drop-in' replacement for the equivalent function, and the design goal is not to extend the syntax of the language but to mimic the existing syntax as much as possible.
@node Very simple macros, Debugging macros, Foreword
SYNTAX-RULES provides very powerful pattern-matching and destructuring facilities. With very simple macros, however, most of this power is unused. Here is an example:
@lisp
(define-syntax nth-value
(syntax-rules ()
((nth-value n values-producing-form)
(call-with-values
(lambda () values-producing-form)
(lambda all-values
(list-ref all-values n))))))
@end lisp
When using functions that return multiple values, it is occasionally the case that you are interested in only one of the return values. The NTH-VALUE macro evaluates the VALUES-PRODUCING-FORM and extracts the Nth return value.
Before the macro has been evaluated, Scheme would treat a form that begins with NTH-VALUE as it would any other form: it would look up the value of the variable NTH-VALUE in the current environment and apply it to the values produced by evaluating the arguments.
DEFINE-SYNTAX introduces a new special form to Scheme. Forms that begin with NTH-VALUE are no longer simple procedure applications. When Scheme processes such a form, it uses the SYNTAX-RULES we provide to rewrite the form. The resulting rewrite is then processed in place of the original form.
Forms are only rewritten if the operator position is an IDENTIFIER that has been DEFINE-SYNTAX'd. Other uses of the identifier are not rewritten.
You cannot write an `infix' macro, nor can you write a macro in a `nested position', i.e., an expression like @code{((foo x) y z)} is always considered a procedure application. (The subexpression @code{(foo x)} will be rewritten of course, but it will not be able to affect the processing of subexpressions @code{y} and @code{z}.) This will be important later on.
SYNTAX-RULES is based on token-replacement. SYNTAX-RULES defines a series of patterns and templates. The form is matched against the pattern and the various pieces are transcribed into the template. This seems simple enough, but there is one important thing to always keep in mind.
THE SYNTAX-RULES SUBLANGUAGE IS NOT SCHEME!
This is a crucial point that is easy to forget. In the example,
@lisp
(define-syntax nth-value
(syntax-rules ()
((nth-value n values-producing-form)
(call-with-values
(lambda () values-producing-form)
(lambda all-values
(list-ref all-values n))))))
@end lisp
the pattern @code{(nth-value n values-producing-form)} looks like Scheme code
and the template
@lisp
(call-with-values
(lambda () values-producing-form)
(lambda all-values
(list-ref all-values n)))
@end lisp
Really seems to be Scheme code, but when Scheme is applying the syntax-rules rewrite there is NO SEMANTIC MEANING attached to the tokens. The meaning will be attached at a later point in the process, but not here.
One reason this is easy to forget is that in a large number of cases it doesn't make a difference, but when you write more complicated rules you may find unexpected expansions. Keeping in mind that syntax-rules only manipulates patterns and templates will help avoid confusion.
This example makes good use of patterns and templates. Consider the form @code{(nth-value 1 (let ((q (get-number))) (quotient/remainder q d)))} During the expansion of NTH-VALUE, the *entire* subform @code{(let ((q (get-number))) (quotient/remainder q d))} is bound to VALUES-PRODUCING-FORM and transcribed into the template at @code{(lambda () values-producing-form)} The example would not work if VALUES-PRODUCING-FORM could only be bound to a symbol or number.
A pattern consists of a symbol, a constant, a list (proper or improper) or vector of more patterns, or the special token "..." (A series of three consecutive dots in this paper will *always* mean the literal token "..." and @b{never} be used for any other reason.) It is not allowed to use the same symbol twice in a pattern. Since the pattern is matched against the form, and since the form @b{always} starts with the defined keyword, it does not participate in the match.
You may reserve some symbols as `literals' by placing them in the list that is the first argument to syntax-rules. Essentially, they will be treated as if they were constants but there is some trickiness that ensures that users of the macro can still use those names as variables. The trickiness `does the right thing' so I won't go into details.
You may find macros written using the token "_" rather than repeating the name of the macro:
@lisp
(define-syntax nth-value
(syntax-rules ()
((_ n values-producing-form)
(call-with-values
(lambda () values-producing-form)
(lambda all-values
(list-ref all-values n))))))
@end lisp
I personally find this to be confusing and would rather duplicate the
macro name.
Here are the rules for pattern matching:
@itemize @minus
@item
A constant pattern will only match against an EQUAL? constant.
We'll exploit this later on.
@item
A symbol that is one of the `literals' can only match against the exact same symbol in the form, and then only if the macro user hasn't shadowed it.
@item
A symbol that is *not* one of the literals can match against *any* complete form. (Forgetting this can lead to surprising bugs.)
@item
A proper list of patterns can only match against a list form of the same length and only if the subpatterns match.
@item
An improper list of patterns can only match against a list form of the same or greater length and only if the subpatterns match. The `dotted tail' of the pattern will be matched against the remaining elements of the form. It rarely makes sense to use anything but a symbol in the dotted tail of the pattern.
@item
The ... token is special and will be discussed a bit later.
@end itemize
@node Debugging macros, N-ary macros, Very simple macros
As macros get more complicated, they become trickier to debug. Most Scheme systems have a mechanism by which you can invoke the macro expansion system on a piece of list structure and get back the expanded form. In MzScheme you could do this:
@lisp
(syntax-object->datum
(expand '(nth-value 1 (let ((q (get-number)))
(quotient/remainder q d)))))
@end lisp
In MIT Scheme,
@lisp
(unsyntax (syntax '(nth-value 1 (let ((q (get-number)))
(quotient/remainder q d)))
(nearest-repl/environment)))
@end lisp
Be prepared for some interesting output --- you may not realize how many forms are really macros and how much code is produced. The macro system may recognize and optimize certain patterns of function usage as well. It would not be unusual to see @code{(caddr x)} expand into @code{(car (cdr (cdr x)))} or into @code{(%general-car-cdr 6 x)}.
Be prepared, too, for some inexplicable constructs. Some syntax objects may refer to bindings that are only lexically visible from within the expander. Syntax objects may contain information that is lost when they are converted back into list structure. You may encounter apparently illegal expansions like this:
@lisp
(lambda (temp temp temp) (set! a temp) (set! b temp) (set! c temp))
@end lisp
There are three internal syntax objects that represent the three different parameters to the lambda expression, and each assignment referred to a unique one, but each individual syntax object had the same symbolic name, so their unique identity was lost when they were turned back into list structure.
Debugging trick
One very easy debugging trick is to wrap the template with a quote:
@lisp
(define-syntax nth-value
(syntax-rules ()
((_ n values-producing-form)
'(call-with-values ;; @r{Note the quote!}
(lambda () values-producing-form)
(lambda all-values
(list-ref all-values n))))))
@end lisp
Now the macro returns the filled-in template as a quoted list:
@lisp
(nth-value (compute-n) (compute-values))
=> (call-with-values (lambda () (compute-values))
(lambda all-values (list-ref all-values (compute-n))))
@end lisp
Sometimes it is difficult to understand why a pattern didn't match something you thought it should or why it did match something it shouldn't. It is easy to write a pattern testing macro:
@lisp
(define-syntax test-pattern
(syntax-rules ()
((test-pattern one two) "match 1")
((test-pattern one two three) "match 2")
((test-pattern . default) "fail")))
@end lisp
Debugging trick
A second trick is to write a debugging template:
@lisp
(define-syntax nth-value
(syntax-rules ()
((_ n values-producing-form)
'("Debugging template for nth-value"
"n is" n
"values-producing-form is" values-producing-form))))
@end lisp
@node N-ary macros, Syntax errors, Debugging macros
By using a dotted tail in the pattern we can write macros that take an arbitrary number of arguments.
@lisp
(define-syntax when
(syntax-rules ()
((when condition . body) (if condition (begin . body) #f))))
@end lisp
An example usage is
@lisp
(when (negative? x)
(newline)
(display "Bad number: negative."))
@end lisp
The pattern matches as follows:
@example
condition = (negative? x)
body = ((newline) (display "Bad number: negative."))
@end example
Since the pattern variable `body' is in the dotted tail position, it is matched against the list of remaining elements in the form. This can lead to unusual errors. Suppose I had written the macro this way:
@lisp
(define-syntax when
(syntax-rules ()
((when condition . body) (if condition (begin body) #f))))
@end lisp
The pattern is matched against the list of remaining arguments, so in the template it will expand to a list:
@lisp
(when (negative? x)
(newline)
(display "Bad number: negative."))
@end lisp
expands to
@lisp
(if (negative? x)
(begin ((newline) (display "Bad number: negative.")))
#f)
@end lisp
But this @b{almost} works. The consequence of the condition is evaluated as if it were a function call. The `function' is the return value of the call to newline and the `argument' the return value from display. Since the rules for evaluation are to evaluate the subforms and then apply the resulting procedure to the resulting arguments, this may actually print a newline and display the string "Bad number: negative." before raising an error. One could easily be fooled into thinking that the WHEN form succesfully ran to completion and it was the code *subsequent* to the WHEN that raised the error.
The original code had this in the template: @code{(begin . body)}
When the template is filled in, the body is placed in `the dotted tail'. Since the body is a list of forms, the effect is as if you had used CONS rather than LIST to create the resultant form. Unfortunately, this trick does not generalize; you can only prefix things in this manner.
Macro `rest args' get bound to a list of forms, so remember to `unlist' them at some point.
There is another bug in the original form:
@lisp
(define-syntax when
(syntax-rules ()
((when condition . body) (if condition (begin . body) #f))))
(when (< x 5))
@end lisp
expands into
@lisp
(if (< x 5) (begin) #f)
@end lisp
Recall that pattern variables match anything, including the empty list. The pattern variable BODY is bound to the empty list. When the template form @code{(begin . body)} is filled in, the token BEGIN is consed to the empty list resulting in an illegal (BEGIN) form.
Since @code{(when (< x 5))} is unusual itself, one solution is to modify the macro like this:
@lisp
(define-syntax when
(syntax-rules ()
((when condition form . forms)
(if condition (begin form . forms) #f))))
@end lisp
Now the pattern will match only if the WHEN expression has at least one consequent form and the resulting BEGIN subform is guaranteed to contain at least one form.
This sort of macro --- one that takes an arbitrary number of subforms and evaluates them in an `implicit begin' --- is extremely common. It is valuable to know this macro idiom:
Implicit Begin idiom
Use this idiom for a macro that allows an arbitrary number of subforms and processes them sequentially (possibly returning the value of the last subform).
The pattern should end in "FORM . FORMS)" to ensure a minimum of one subform.
The template has either @code{(begin form . forms)} or uses the implicit begin of another special form, e.g. @code{(lambda () form . forms)}
A strange and subtle bug
This section describes a bug in the syntax-rules expander for MzScheme v207. A fix has been made to the sources, so versions later than this ought to work. You can skip this section (about 1 page) if you wish.
Suppose you are a former INTERCAL hacker and you truly miss the language. You wish to write a macro PLEASE that simply removes itself from the expansion:
@lisp
(please display "foo") => (display "foo")
@end lisp
Here is an attempt:
@lisp
(define-syntax please
(syntax-rules ()
((please . forms) forms)))
@end lisp
This works on some Scheme systems, but not on others and the reason is quite subtle. In Scheme, function application is indicated syntactically by a list consisting of a function and zero or more arguments. Above macro, although it returns such a list, creates that list as part of the pattern matching process. When a macro is expanded careful attention is paid to ensure that subforms from the point of use continue to refer to the lexical environment at that point while subforms that are introduced by the template continue to refer to the lexical environment of the macro definition. The list that is returned from the PLEASE macro, however, is a subform that was not created at either the macro use point *or* at the macro definition point, but rather in the environment of the pattern matcher. In MzScheme, that environment does not have a syntactic mapping to interpret a list as a function call, so the following error results: @code {"compile: bad syntax; function application is not allowed, because no #%app syntax transformer is bound in: (display 33)"}
The fix is trivial:
@lisp
(define-syntax please
(syntax-rules ()
((please function . arguments) (function . arguments))))
@end lisp
The resulting expansion is now a list constructed within the template of the macro rather than one constructed by the pattern matcher. The template environment is used and the resulting list is therefore interpreted as a function call.
Again, this is an extremely subtle point, but it is easy to remember this rule of thumb:
Don't use macro `rest' arguments as an implicit function call. Use a template with an explicit (function . arguments) element.
END OF STRANGE AND SUBTLE BUG SECTION
Syntax rules allows for an arbitrary number of pattern/template pairs. When a form is to be rewritten, a match is attempted against the first pattern. If the pattern cannot be matched, the next pattern is examined. The template associated with the first matching pattern is the one used for the rewrite. If no pattern matches, an error is raised. We will exploit this heavily.
@node Syntax errors, `Accidental' matching, N-ary macros
The macro system will raise an error if no pattern matches the form, but it will become useful to us to write patterns that explicitly reject a form if the pattern *does* match. This is easily accomplished by making the template for a pattern expand into poorly formed code, but the resulting error message is rather unhelpful:
@lisp
(define-syntax prohibit-one-arg
(syntax-rules ()
((prohibit-one-arg function argument) (if)) ;; @r{bogus expansion}
((prohibit-one-arg function . arguments)
(function . arguments))))
(prohibit-one-arg + 2 3) => 5
(prohibit-one-arg display 'foo)
@r{if: bad syntax (has 0 parts after keyword) in: (if)}
@end lisp
To make a more helpful error message, and to indicate in the macro definition that the bogus expansion is intentional, we'll define a macro designed to raise a syntax error. (There is, no doubt a procedural way of doing this, but we wish to raise this error during macro expansion and the pattern language provides no way to do this directly. A more complicated macro system could be used, but this is nice, easy, and portable.)
@lisp
(define-syntax syntax-error
(syntax-rules ()
((syntax-error) (syntax-error "Bad use of syntax error!"))))
@end lisp
We can now write macros that expand into `calls' to syntax error. If the call contains any arguments, the pattern will not match and an error will be raised. Most scheme macro systems display the form that failed to match, so we can put our debugging messages there.
@lisp
(define-syntax prohibit-one-arg
(syntax-rules ()
((prohibit-one-arg function argument)
(syntax-error
"Prohibit-one-arg cannot be used with one argument."
function argument))
((prohibit-one-arg function . arguments)
(function . arguments))))
(prohibit-one-arg display 3)
=> syntax-error: bad syntax in: (syntax-error "Prohibit-one-arg cannot
be used with one argument." display 3)
@end lisp
Write a syntax-error macro.
Write `rejection' patterns by expanding into a call to syntax-error.
@node `Accidental' matching, Recursive expansion, Syntax errors
The pattern roughly resembles what it matches, but this can be a source of confusion. A pattern like this:
@lisp
(my-named-let name (binding . more-bindings) . body)
@end lisp
will match this form:
@lisp
(my-named-let ((x 22)
(y "computing square root"))
(display y)
(display (sqrt x)))
@end lisp
as follows:
@example
name = ((x 22) (y "computing square root"))
binding = display
more-bindings = (y)
body = ((display (sqrt x)))
@end example
Nested list structure in the pattern will match similar nested list structure in the form, but symbols in the pattern will match *anything*.
In this example we want to prohibit matching NAME with a list. We do this by `guarding' the intended pattern with patterns that should not be allowed.
@lisp
(define-syntax my-named-let
(syntax-rules ()
((my-named-let () . ignore)
(syntax-error "NAME must not be the empty list."))
((my-named-let (car . cdr) . ignore)
(syntax-error "NAME must be a symbol." (car . cdr)))
((my-named-let name bindings form . forms) ;; implicit begin
(let name bindings form . forms))))
@end lisp
Protect against accidental pattern matching by writing guard patterns that match the bad syntax.
@node Recursive expansion, List Recursion Idiom, `Accidental' matching
Our syntax-error macro can expand into a syntax-error form. If the result of a macro expansion is itself a macro form, that resulting macro form will be expanded. This process continues until either the resulting form is not a macro call or the resulting form fails to match a pattern. The syntax-error macro is designed to fail to match anything but a no-argument call. A no-argument call to syntax-error expands into a one-argument call to syntax-error which fails to match.
The template for a macro can expand to a form that embeds a call to the same macro within it. The embedded code will be expanded normally if the surrounding code treats it as a normal form. The embedded call will normally contain fewer forms than the original call so that the expansion eventually terminates with a trivial expansion.
Note, however, that the recursive macro will not be expanded unless the intermediate code uses it as a macro call. This is why the debugging trick of quoting the template works: the macro form is expanded to a quote form. The quote form just treats its argument as data so no further expansion is done.
Recursive expansion always produces a nested result. This is used to good effect in this example:
@lisp
(define-syntax bind-variables
(syntax-rules ()
((bind-variables () form . forms)
(begin form . forms))
((bind-variables ((variable value0 value1 . more) . more-bindings) form . forms)
(syntax-error "bind-variables illegal binding" (variable value0 value1 . more)))
((bind-variables ((variable value) . more-bindings) form . forms)
(let ((variable value)) (bind-variables more-bindings form . forms)))
((bind-variables ((variable) . more-bindings) form . forms)
(let ((variable #f)) (bind-variables more-bindings form . forms)))
((bind-variables (variable . more-bindings) form . forms)
(let ((variable #f)) (bind-variables more-bindings form . forms)))
((bind-variables bindings form . forms)
(syntax-error "Bindings must be a list." bindings))))
@end lisp
@code{BIND-VARIABLES} is much like @code{LET*}, but you are allowed to omit the
value in the binding list. If you omit the value, the variable will
be bound to #F. You can also omit the parenthesis around the variable
name.
@lisp
(bind-variables ((a 1) ;; @r{a will be 1}
(b) ;; @r{b will be #F}
c ;; @r{so will c}
(d (+ a 3))) ;; @r{a is visible in this scope.}
(list a b c d))
@end lisp
When bind-variables is processed, its immediate expansion is this:
@lisp
(let ((a 1))
(bind-variables ((b)
c
(d (+ a 3)))
(list a b c d)))
@end lisp
The LET form is now processed and as part of that processing the body
of the LET will be expanded.
@lisp
(bind-variables ((b)
c
(d (+ a 3)))
(list a b c d))
@end lisp
This expands into another LET form:
@lisp
(let ((b #f))
(bind-variables (c
(d (+ a 3)))
(list a b c d)))
@end lisp
The process terminates when the binding list is the empty list. At this point each level of expansion is placed back in its surrounding context to yield this nested structure:
@lisp
(let ((a 1))
(let ((b #f))
(let ((c #f))
(let ((d (+ a 3)))
(list a b c d)))))
@end lisp
There are several things to note here:
@enumerate
@item
The macro uses the implicit begin idiom.
@item
There is a `guard' patterns to match bindings with more than one value form.
@item
In each expansion, the first binding will be removed. The remaining bindings appear in the binding position at the recursive call, thus the macro is inductive over the list of bindings.
@item
There is a base case of the induction that matches the empty binding list.
@item
On each iteration a match is attempted on first binding against these patterns in order:
@itemize @minus
@item
@code{(variable value0 value1 . more)} a list of more than two elements
@item
@code{(variable value)} a list of exactly two elements
@item
@code{(variable)} a list of one element
@item
variable not a list
@end itemize
@end enumerate
This order is used because the last pattern will match anything and would prevent the previous matches from being matched.
This macro idiom is also extremely common.
@node List Recursion Idiom, Ellipses, Recursive expansion
This idiom is used to recursively process a list of macro subforms.
The macro has a parenthesised list of items in a fixed position.
This list may be empty (), but it may not be omitted.
A pattern that matches the empty list at that position preceeds the other matches. The template for this pattern does not include another use of this macro.
One or more patterns that have dotted tails in the list position are present. The patterns are ordered from most-specific to most-general to ensure that the later matches do not `shadow' the earlier ones. The associated templates are either syntax-errors or contain another use of this macro. The list position in the recursive call will contain the dotted tail component.
A minimal list recursion looks like this:
@lisp
(define-syntax mli
(syntax-rules ()
((mli ()) (base-case))
((mli (item . remaining)) (f item (mli remaining)))
((mli non-list) (syntax-error "not a list")))))
@end lisp
Note that the recursive call in the second clause uses the pattern variable REMAINING and that it is NOT dotted. Each recursive call therefore contains a shorter list of forms at the same point.
At this point, things are starting to get complicated. We can no longer look upon macros as `simple rewrites'. We are starting to write macros whose purpose is to control the actions of the macro processing engine. We will be writing macros whose purpose is not to produce code but rather to perform computation.
A macro is a compiler. It takes source code written in one language, i.e. Scheme with some syntactic extension, and it generates object code written in another language, i.e. Scheme *without* that syntactic extension. The language in which we will be writing these compilers is NOT Scheme. It is the pattern and template language of syntax-rules.
A compiler consists of three basic phases: a parser that reads the source language, a transformation and translation process that maps the source language semantics into constructs in the target language, and a generator that turns the resulting target constructs into code. Our macros will have these three identifiable parts. The parser phase will use pattern language to extract code fragments from the source code. The translator will operate on these code fragments and use them to construct and manipulate Scheme code fragments. The generator phase will assemble the resulting Scheme fragments of Scheme code with a template that will be the final result.
The pattern and template language of syntax-rules is an unusual implementation language. The pattern-driven rule model makes it easy to write powerful parsers. The template-driven output model makes code generation a snap. The automatic hygiene tracks the context of the code fragments and essentially allows us to manipulate the intermediate code fragments as elements in an abstract syntax tree. There is just one problem: the model of computation is non-procedural. Simple standard programming abstractions such as subroutines, named variables, structured data, and conditionals are not only different from Scheme, they don't exist in a recognizable form!
But if we look carefully, we will find our familiar programming abstractions haven't disappeared at all --- they have been destructured, taken apart, and re-formed into strange shapes. We will be writing some strange-looking code. This code will be written in the form of a macro transformer, but it will not be a macro in the traditional sense.
When a macro is expanded the original form is rewritten into a new form. If the result of macro expansion is a new macro form, the expander then expands that result. So if the macro expands into another call to itself, we have written a tail-recursive loop. We made a minimal use of this above with the syntax-error macro, but now we will be doing this as a standard practice.
We encountered the list recursion idiom above. We can create an analagous list iteration:
@lisp
(define-syntax bind-variables1
(syntax-rules ()
((bind-variables1 () form . forms)
(begin form . forms))
((bind-variables1 ((variable value0 value1 . more) . more-bindings) form . forms)
(syntax-error "bind-variables illegal binding" (variable value0 value1 . more)))
((bind-variables1 ((variable value) . more-bindings) form . forms)
(bind-variables1 more-bindings (let ((variable value)) form . forms)))
((bind-variables1 ((variable) . more-bindings) form . forms)
(bind-variables1 more-bindings (let ((variable value)) form . forms)))
((bind-variables1 (variable . more-bindings) form . forms)
(bind-variables1 more-bindings (let ((variable #f)) form . forms)))
((bind-variables1 bindings form . forms)
(syntax-error "Bindings must be a list." bindings))))
@end lisp
Because we process the leftmost variable first, the resulting form will be nested in the reverse order from the recursive version. We will deal with this issue later and just write the bindings list backwards for now.
@lisp
(bind-variables1 ((d (+ a 3)) ;; @r{a is visible in this scope.}
c ;; @r{c will be bound to #f}
(b) ;; @r{so will b}
(a 1)) ;; @r{a will be 1}
(list a b c d))
@end lisp
This macro first expands into this:
@lisp
(bind-variables1 (c
(b)
(a 1))
(let ((d (+ a 3)))
(list a b c d)))
@end lisp
But since this form is a macro, the expander is run again. This
second expansion results in this:
@lisp
(bind-variables1 ((b)
(a 1))
(let ((c #f))
(let ((d (+ a 3)))
(list a b c d))))
@end lisp
The expander is run once again to produce this:
@lisp
(bind-variables1 ((a 1))
(let ((b #f))
(let ((c #f))
(let ((d (+ a 3)))
(list a b c d)))))
@end lisp
Another iteration produces this:
@lisp
(bind-variables1 ()
(let ((a 1))
(let ((b #f))
(let ((c #f))
(let ((d (+ a 3)))
(list a b c d))))))
@end lisp
The next expansion does not contain a call to the macro:
@lisp
(begin
(let ((a 1))
(let ((b #f))
(let ((c #f))
(let ((d (+ a 3)))
(list a b c d))))))
@end lisp
We could call this the `List Iteration Idiom', but let's take this in a completely different direction.
Notice that the template for most of the rules begins with the macro name itself in order to cause the macro expander to immediately re-expand the result. But let's ignore the expander and pretend that *the template form is a tail-recursive function call*.
By removing the error checking and the multiple formats for argument bindings, we can see the essence of what is going on:
@lisp
(define-syntax bind-variables1
(syntax-rules ()
((bind-variables1 () . forms)
(begin . forms))
((bind-variables1 ((variable value) . more-bindings) . forms)
(bind-variables1 more-bindings (let ((variable value)) . forms)))))
@end lisp
BIND-VARIABLES1 is tail-recursive function. SYNTAX-RULES functions as a COND expression. The arguments to bind-variables1 are unnamed, but by placing patterns in the appropriate positions, we can destructure the values passed into named variables.
Let us demonstrate this insight through a simple example.
We want to mimic the Common Lisp macro MULTIPLE-VALUE-SETQ. This form takes a list of variables and form that returns multiple values. The form is invoked and the variables are SET! to the respective return values. A putative expansion might be this:
@lisp
(multiple-value-set! (a b c) (generate-values))
=> (call-with-values (lambda () (generate-values))
(lambda (temp-a temp-b temp-c)
(set! a temp-a)
(set! b temp-b)
(set! c temp-c)))
@end lisp
For the sake of clarity, we'll start out with no error checking. Since our macros are tail-recursive, we can write separate subroutines for each part of the expansion.
First, we write the `entry-point' macro. This macro is the one that would be exported to the user. This macro will call the one that creates the temporaries and the necessary assignment expressions. It passes in empty lists as the initial values for these expressions.
@lisp
(define-syntax multiple-value-set!
(syntax-rules ()
((multiple-value-set! variables values-form)
(gen-temps-and-sets
variables
() ;; initial value of temps
() ;; initial value of assignments
values-form))))
@end lisp
Assuming that gen-temps-and-sets does the right thing, we will want to emit the resulting code. The code is obvious:
@lisp
(define-syntax emit-cwv-form
(syntax-rules ()
((emit-cwv-form temps assignments values-form)
(call-with-values (lambda () values-form)
(lambda temps . assignments)))))
@end lisp
The temps and assignments are just pasted into the right spots.
Now we need to write the routine that generates a temporary and an assignment for each variable. We'll again use induction on the list of variables. When that list is empty, we'll call EMIT-CMV-FORM with the collected results. On each iteration we'll remove one variable, generate the temporary and assignment for that variable, and collect them with the other temporaries and assignments.
@lisp
(define-syntax gen-temps-and-sets
(syntax-rules ()
((gen-temps-and-sets () temps assignments values-form)
(emit-cwv-form temps assignments values-form))
((gen-temps-and-sets (variable . more) temps assignments values-form)
(gen-temps-and-sets
more
(temp . temps)
((set! variable temp) . assignments)
values-form))))
@end lisp
Before we develop this further, though, there are some important points about this macro that should not be overlooked.
@quotation
Did I ever tell you that Mrs. McCave
Had twenty-three sons, and she named them all Dave?
-- Dr. Seuss
@end quotation
Our MULTIPLE-VALUE-SET! macro generates a temporary variable for each of the variables that will be assigned (this is in the second clause of GEN-TEMPS-AND-SETS). Unfortunately, all the temporary variables are named `TEMP'. We can see this if we print the expanded code:
@lisp
(call-with-values (lambda () (generate-values))
(lambda (temp temp temp)
(set! c temp)
(set! b temp)
(set! a temp)))
@end lisp
@quotation
Well, she did. And that wasn't a smart thing to do.
You see, when she wants one, and calls out "Yoo-Hoo!
Come into the house, Dave!" she doesn't get one.
All twenty-three Daves of hers come on the run!
-- Ibid
@end quotation
The importance of unique identifiers to avoid name collisions is taught at a very young age.
The funny thing is, though, the code works. There are actually three different syntactic objects (all named temp) that will be bound by the LAMBDA form, and each SET! refers to the appropriate one. But there are six identifiers here with the name TEMP. Why did the macro expander decide to create three pairs of associated syntax objects rather than six individual ones or two triplets? The answer lies in this template:
@lisp
(gen-temps-and-sets
more
(temp . temps)
((set! variable temp) . assignments)
values-form)
@end lisp
The variable TEMP that is being prepended to the list of temps and the variable TEMP in the newly created (SET! VARIABLE TEMP) form will refer to the *same* syntactic object because they are created during the same expansion step. It will be a *different* syntactic object than any created during any other expansion step.
Since we run the expansion step three times, one for each variable to be assigned, we get three variables named temp. They are paired up properly because we generated all references to them at the same time.
@enumerate
@item
Introduce associated code fragments in a single expansion step.
@item
Introduce duplicated, but unassociated fragments in different expansion steps.
@end enumerate
Let's explore two variants of this program. We will separate the genaration of the temps and the assignments into two different functions, GEN-TEMPS and GEN-SETS.
Because both GEN-TEMPS and GEN-SETS iterate over the list of variables as they operate, we modify MULTIPLE-VALUE-SET! to pass the list twice. GEN-TEMPS will do induction over one copy, GEN-SETS will work with the other.
@lisp
(define-syntax multiple-value-set!
(syntax-rules ()
((multiple-value-set! variables values-form)
(gen-temps
variables ;; provided for GEN-TEMPS
() ;; initial value of temps
variables ;; provided for GEN-SETS
values-form))))
@end lisp
GEN-TEMPS does induction over the first list of variables and creates a list of temp variables.
@lisp
(define-syntax gen-temps
(syntax-rules ()
((gen-temps () temps vars-for-gen-set values-form)
(gen-sets temps
vars-for-gen-set
() ;; @r{initial set of assignments}
values-form))
((gen-temps (variable . more) temps vars-for-gen-set values-form)
(gen-temps
more
(temp . temps)
vars-for-gen-set
values-form))))
@end lisp
GEN-SETS also does induction over the list of variables and creates a list of assignment forms.
@lisp
(define-syntax gen-sets
(syntax-rules ()
((gen-sets temps () assignments values-form)
(emit-cwv-form temps assignments values-form))
((gen-sets temps (variable . more) assignments values-form)
(gen-sets
temps
more
((set! variable temp) . assignments)
values-form))))
@end lisp
Although the result of expansion looks similar, this version of the macro does not work. The name TEMP that is generated in the GEN-TEMPS expansion is different from the name TEMP generated in the GEN-SETS expansion. The expansion will contain six unrelated identifiers (all named TEMP).
But we can fix this. Notice that GEN-SETS carries around the list of temps generated by GEN-TEMPS. Instead of generating a new variable named TEMP, we can extract the previously generated TEMP from the list and use that.
First, we arrange for GEN-TEMPS to pass the temps twice. We need to destructure one of the lists to do the iteration, but we need the whole list for the final expansion.
@lisp
(define-syntax gen-temps
(syntax-rules ()
((gen-temps () temps vars-for-gen-set values-form)
(gen-sets temps
temps ;; @r{another copy for gen-sets}
vars-for-gen-set
() ;; @r{initial set of assignments}
values-form))
((gen-temps (variable . more) temps vars-for-gen-set values-form)
(gen-temps
more
(temp . temps)
vars-for-gen-set
values-form))))
@end lisp
GEN-SETS again uses list induction, but on two parallel lists rather than a single list (the lists are the same length). Each time around the loop we bind TEMP to a previously generated temporary.
@lisp
(define-syntax gen-sets
(syntax-rules ()
((gen-sets temps () () assignments values-form)
(emit-cwv-form temps assignments values-form))
((gen-sets temps (temp . more-temps) (variable . more-vars) assignments values-form)
(gen-sets
temps
more-temps
more-vars
((set! variable temp) . assignments)
values-form))))
@end lisp
Now we aren't generating new temps in GEN-SETS, we are re-using the
old ones generated by GEN-SETS.
@node Ellipses, , List Recursion Idiom
Ellipses are not really that difficult to understand, but they have three serious problems that make them seem mysterious when you first encounter them.
@itemize @minus
@item
The ... operator is very strange looking. It isn't a `normal' token and it has a prior established meaning in the English language that is somewhat at odds with what it does. Macros that use ellipses look like `sound bites' from movie reviews.
@item
The ... token is a reserved word in the macro language. Unlike all other tokens, it cannot be re-bound or quoted. In the macro language, ... acts like a syntactic mark of the same nature as a double-quote, dotted tail, or parenthesis. (It is nonsensical to try to use an open-parenthesis as a variable name! The same is true for ... in the macro language.) But in standard scheme, ... is an indetifier token and could conceivably be used as a variable.
@item
Out of every form in Scheme, the ... operator is the *only* one that is POSTFIX! The ... operator modifies how the PREVIOUS form is interpreted by the macro language. This one exception is probably responsible for most of the confusion. (Of course, if you were to use ... as a function in normal scheme, it is a prefix operator.)
@end itemize
In a macro pattern, ellipses can only appear as the last token in a LIST or VECTOR pattern. There must be a pattern immediately before it (because it is a postfix operator), so ellipses cannot appear right after an unmatched open-paren. In a pattern, the ellipses cause the pattern immediately before it to be able to match repeated occurrances of itself in the tail of the enclosing list or vector pattern. Here is an example.
Suppose our pattern were this: @code{(foo var #t ((a . b) c) ...)}
This pattern would match: @code{(foo (* tax x) #t ((moe larry curly) stooges))} because var matches anything, @var{#t} matches @var{#t}, and one occurrance of @code{((a . b) c)} would match . a would match moe, b would match the tail @code{(larry curly)} and c would match stooges.
This pattern would match:
@lisp
(foo 11 #t ((moe larry curly) stooges)
((carthago delendum est) cato)
((egad) (mild oath)))
@end lisp
because var matches anything, #t matches #t, and three occurrances of the @code{((a . b) c)} pattern would each match the three remaining elements. Note that the third element matches with a being egad, b being empty and the pattern c matching the entire list @code{(mild oath)}.
This pattern would *not* match:
@lisp
(foo 11 #t ((moe larry curly) stooges)
((carthago delendum est) cato)
((egad) mild oath))
@end lisp
The reason is that the final element @code{((egad) mild oath)} cannot match @code{((a . b) c)} because it the pattern is a list of two elements and we supplied a list of three elements.
This pattern *would* match: @code{(foo #f #t)} because var matches anything, #t matches #t, and zero occurrances of the @code{((a . b) c)} pattern would match the empty tail.
@enumerate
@item
Ellipses must be last in a list or vector pattern. The list or vector must have at least one initial pattern.
@item
Ellipses is a postfix operator that operates on the pattern before it.
@item
Ellipses allows the containing list or vector pattern to match provided that the head elements of the pattern match the head elements of the list or vector up to (but not including) the pattern before the ellipses, *and* zero or more copies of that pattern match *each and all* of the remaining elements.
@item
Remember that zero occurrance of the pattern can `match'.
When a pattern has an ellipses, the pattern variables within the clause prior to the ellipses work differently from normal. When you use one of these pattern variables in the template, it must be suffixed with an ellipses, or it must be contained in a template subform that is suffixed with an ellipses. In the template, anything suffixed with an ellipses will be repeated as many times as the enclosed pattern variables matched.
@end enumerate
Suppose our pattern is @code{(foo var #t ((a . b) c) ...)} and it is matched against
@lisp
(foo 11 #t ((moe larry curly) stooges)
((carthago delendum est) cato)
((egad) (mild oath)))
@end lisp
These are example templates and the forms produced:
@lisp
((a ...) (b ...) var (c ...))
=> ((moe carthago egad)
((larry curly) (delendum est) ())
11
(stooges cato (mild-oath)))
((a b c) ... var)
=> ((moe (larry curly) stooges)
(carthago (delendum est) cato)
(egad () (mild oath)) 11)
((c . b) ... a ...)
=> ((stooges larry curly)
(cato delendum est)
((mild oath))
moe carthago egad)
(let ((c 'b) ...) (a 'x var c) ...)
=> (let ((stooges '(larry curly))
(cato '(delendum est))
((mild oath) '()))