-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathdocs.html
634 lines (555 loc) · 23.2 KB
/
docs.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
<!DOCTYPE html>
<html>
<head>
<title>Lonang 8086 Compiler</title>
<style>
body {
background: #eee;
}
#main {
background: #fff;
width: 800px;
margin: 40px auto;
padding: 20px;
border-radius: 2px;
}
pre {
background: #222;
color: #fff;
font-family: monospace;
padding: 8px 24px;
}
code {
padding: 1px 4px;
margin: 1px 0;
font-size: 90%;
color: #b32;
background-color: #fee;
border-radius: 2px;
}
#goup {
position: fixed;
right: 40px;
bottom: 40px;
text-align: center;
opacity: 0.4;
transition: opacity 400ms;
}
#goup:hover {
opacity: 1;
}
.arrow-up {
width: 0;
height: 0;
border-left: 18px solid transparent;
border-right: 18px solid transparent;
border-bottom: 24px solid #448;
}
table {
border-collapse: collapse;
}
table, td, th {
border: 1px solid #ddd;
}
td {
padding: 4px 4px;
}
</style>
<meta charset="utf-8" />
</head>
<body>
<a id="goup" href="#index">
<div class="arrow-up" title="Go up"></div>
</a>
<div id="main">
<h1>Lonang 8086 Compiler</h1>
<p>This document describes the basic usage of the <b>Lonang 8086</b>.
This language is a C++ like language which is used to generate
<i>8086 assembly</i> executable code.</p>
<h2 id="index">Index</h2>
<ul>
<li><a href="#general">General</a></li>
<li>
<a href="#variables">Variables</a>
<ul>
<li><a href="#registers">Registers</li>
<li><a href="#vectors">Vectors</li>
</ul>
</li>
<li><a href="#comments">Comments</a></li>
<li>
<a href="#control-flow">Control flow</a>
<ul>
<li><a href="#if-else-statements"><code>if-else</code> statements</a></li>
<li><a href="#repeat-loop"><code>repeat</code> loop</a></li>
<li><a href="#while-loop"><code>while</code> loop</a></li>
<li><a href="#break-statement"><code>break</code> statement</a></li>
</ul>
</li>
<li><a href="#functions">Functions</a></li>
<li>
<a href="#builtin-functions">Built-in functions</a>
<ul>
<li><a href="#divmod"><code>divmod</code> function</a></li>
</ul>
</li>
<li><a href="#basic-io">Basic IO</a></li>
<ul>
<li><a href="#printf"><code>printf</code> function</a></li>
<li><a href="#putchar"><code>putchar</code> function</a></li>
<li><a href="#getchar"><code>getchar</code> function</a></li>
<li><a href="#setcursor"><code>setcursor</code> function</a></li>
<li><a href="#cls"><code>cls</code> function</a></li>
</ul>
</li>
<li><a href="#compiling">Compiling</a></li>
<li><a href="#debugging">Debugging</a></li>
</ul>
<h2 id="general">General</h2>
<p>Only <b>one</b> statement per line is allowed, since the statement
separator used is the new line character.</p>
<p>Once you have one or more <i>variables</i> at your disposal, you can
perform basic operations such as:</p>
<pre>
; Add 'cx' to 'dx'
dx += cx
; Subtract 'bx' from 'ax'
ax -= bx
; Bitwise operations (they work on the bits level)
ax |= bx ; OR
dx ^= cx ; XOR
ax &= bx ; AND
</pre>
<p>Internally, numbers are represented in base 2, for instance, <i>5</i> is
<i>101</i> (<i>= 1×2<sup>2</sup> + 0×2<sup>1</sup> + 1×2<sup>0</sup></i>).
The bitwise operations allow to work with them on this <i>1</i>'s and
<i>0</i>'s level. The following table represent what each bitwise operation
would do (first two are inputs, third one is output):</p>
<table><tr>
<td><table>
<tr><td colspan=3>OR</td></tr>
<tr><td>0</td><td>0</td><td><b>= 0</b></td></tr>
<tr><td>0</td><td>1</td><td><b>= 1</b></td></tr>
<tr><td>1</td><td>0</td><td><b>= 1</b></td></tr>
<tr><td>1</td><td>1</td><td><b>= 1</b></td></tr>
</table></td>
<td><table>
<tr><td colspan=3>XOR</td></tr>
<tr><td>0</td><td>0</td><td><b>= 0</b></td></tr>
<tr><td>0</td><td>1</td><td><b>= 1</b></td></tr>
<tr><td>1</td><td>0</td><td><b>= 1</b></td></tr>
<tr><td>1</td><td>1</td><td><b>= 0</b></td></tr>
</table></td>
<td><table>
<tr><td colspan=3>AND</td></tr>
<tr><td>0</td><td>0</td><td><b>= 0</b></td></tr>
<tr><td>0</td><td>1</td><td><b>= 0</b></td></tr>
<tr><td>1</td><td>0</td><td><b>= 0</b></td></tr>
<tr><td>1</td><td>1</td><td><b>= 1</b></td></tr>
</table></td>
</tr></table>
<p><b>Multiplication</b> is slightly special, and may be more expensive in
certain cases. As a general rule of thumb, the destination should be either
<i>ax</i> or <i>al</i> and the second product either <i>dx</i> or <i>ah</i>.
The type of multiplication used, either 8 or 16 bits, will be determined by
the <b>largest</b> size of the operands, either the destination or the
source. For instance, <code>al *= 7</code> will use 8 bits mode since 7 can
be represented with 8 bits. But <code>al *= 260</code> will perform 16 bits
multiplication, and discard the bits that don't fit on <i>al</i>.</p>
<p>In the same way, both <b>division</b> <i>and</i> the modulo operator
are specal too. As a general rule of thumb, the destination should be either
<i>ax</i> or <i>al</i> for division, and <i>dx</i> or <i>ah</i> for modulo.s
The type of division used, either 8 or 16 bits, will be determined by
the <b>largest</b> size of the operands, either the destination or the
source. For instance, <code>al /= 3</code> will use 8 bits mode since 3 can
be represented with 8 bits. But <code>al /= 277</code> will perform 16 bits
division, and discard the bits that don't fit on <i>al</i>.</p>
<p>When possible, you should <b>always</b> use an register instead of an
inmediate value, specially important on loops, <i>unless</i> using a
<b>power of two</b>. In this case, an inmediate value is better, and
even preferred.</p>
<h2 id="variables">Variables</h2>
<p>Variables can be defined anywhere in the code, although it is
recommended to define them at the <i>top</i> of the file. Currently
the types supported are:</p>
<ul>
<li><code>byte</code>: Defines a 8-bit integer value.</li>
<li><code>short</code>: Defines a 16-bit integer value.</li>
<li><code>string</code>: Defines a string variable (sequence of bytes).</li>
<li><code>const</code>: Defines a constant value, replaced on compile time.</li>
</ul>
<p>The <code>string</code> type <b>must</b> be ASCII values, and escape
sequences are supported, such as <i>\r\n</i> for new line, <i>\t</i> for
TAB, etc.</p>
<p>Some examples are shown below:</p>
<pre>
; Declaration
byte mybyte = 0x7F
short myint = 0x7FFF
string str = "Some string...\r\n"
const VALUE = 0x7FFF
; Assignment
ax, mybyte = 42, 13
; Swapping values.
; The order in which they get assigned is <b>not</b> guaranteed to be kept!
ax, bx = bx, ax
</pre>
<p>All variables all <b>global</b>, which means that when a function
modifies a variable, the changes are reflected upon the whole file.
One should be careful with this. The names must always start with a
letter (upper or lower case), and may be followed by any number of
alphanumeric characters.</p>
<h3 id="registers">Registers</h3>
<p>There exist certain "special" variables which directly correspond
with the <b>registers</b> of the underlying machine. These are also global.
One should always use registers when possible, since they're faster to
access than any other variable. The available registers are,
for <code>short</code> integers:</p>
<pre>ax, cx, dx, bx, si, di</pre>
<p>The <i>first</i> three are recommended for general use, and the
<i>last</i> three are recommended when indexing a vector, since this
allows certain optimizations.</p>
<p>One can access to the <code>byte</code> version of these by specifying:</p>
<pre>ah, al, bh, bl, ch, cl, dh, dl</pre>
<p>Note, however, that <b>these point to the same registers</b> as the
first letter indicates. However, one can select to use the <b>h</b>igh
or the <b>l</b>ow part of the register, but modifying either will result
in the modification of the <b>x</b> version.</p>
<p>Available short registers, although <b>not recommended</b> to use:</p>
<pre>cs, ds es, ss, sp, bp</pre>
<p>The first four point to the application segments, such as data and
code, and the last two point to the stack. Modifying any of these will
result in unpredictable results.</p>
<h3 id="vectors">Vectors</h3>
<p>It's possible to define a <i>vector</i> of values by appending
<code>[]</code> to the variable type. Vectors, also known as <i>arrays</i>
in other languages, support containing one or more values under the same
variable, which can be later be <i>indexed</i> if one desires to access
its contents. The <code>length</code> of the vector will be infered by
the amount of comma separated values specified. It's also possible to
explicitly define its length inside the square braces. If only one
value is supplied, and the given size is different, this value will be
duplicated accross the vector.</p>
<p>Some <b>valid</b> examples for defining vectors are:</p>
<pre>byte[] v1 = 14
short[] = 10, 20, 30
short[1] v2 = 6
byte[5] v3 = 2, 3, 5, 7, 11
byte[42] v4
byte[6] v5 = ? ; The '?' denotes we don't care about its actual value
byte[9] v6 = 7
</pre>
<p>Some <b>invalid</b> examples for defining vectors are:</p>
<pre>short[1] v1 = 6, 5
byte[5] v2 = 2, 3, 5
</pre>
<p>In order to access a vector, one can make use of either an inmediate
value, such as <code>[2]</code> to access the <b>third</b> item of the
vector (since <code>[0]</code> denotes the <b>first</b> element), any
variable, or some added together. It's important to note, however,
that in order to be as efficient as possible, one should use the
<b>i</b>ndex registers (si and di), possibly adding to them the <b>b</b>x
as the <b>b</b>ase register. The following combinations would cause
no trouble, assuming <code>vector</code> is a vector variable:</p>
<pre>vector[bx] = 1
vector[si] = 2
vector[di] = 3
vector[bx+4] = 4
vector[si-5] = 5
vector[di+6] = 6
vector[bx+si] = 7
vector[bx-di] = 8
vector[bx-si+9] = 9
vector[bx+di-10] = 10
</pre>
<p><b>Any other combination</b> will need to be translated to some of these,
probably causing a <b>lot of overhead</b>. It is strongly encouraged to
use the mentioned registers whenever possible.</p>
<p>The length of the vector cannot be changed on runtime, and can be
queried on compile time by accessing <code>vector.length</code> which
will be replaced with its actual length.</p>
<p><b>Note</b> that the index used to access the vector elements is
the <i>offset in bytes</i>, not the <i>n</i>'th element. One should
be careful with this.</p>
<!-- TODO This should probably be fixed -->
<h2 id="comments">Comments</h2>
<p>Inline comments can be specified after the statement, separated by
the semicolon character <code>;</code>. Multiline comments can also be made,
these being composed of a starting and an ending semicolon alone:</p>
<pre>
;
; This comment is multiline
; The starting semicolon here is optional
It can be omitted too
As shown here
;
byte value = 42 ; Inlined comment
</pre>
<h2 id="control-flow">Control flow</h2>
<h3 id="if-else-statements"><code>if-else</code> statements</h3>
<p>Conditional blocks allow to execute a piece of code if a criteria is met.
It is also possible to execute a different piece of code if that initial
condition is not met, although this is optional. The braces <b>must</b>
be on the same line as the statements</p>
<p>It is possible to optionally label <i>if</i> and <i>else</i> statements
by appending <code>@labelName</code> to the end, before any comments. This
will result in different label names on the generated assembly code. When
omitted, a unique ID will be used.</p>
<p>The supported operators for use on conditional expressions are:</p>
<ul>
<li><code>==</code>. Met when both sides are equal.</li>
<li><code>!=</code>. Met when both sides are different.</li>
<li><code><</code>. Met when the left side is less than the right side.</li>
<li><code>></code>. Met when the left side is greater than the right side.</li>
<li><code><=</code>. Met when the left side is less or equal to the right side.</li>
<li><code>>=</code>. Met when the left side is greater or equal to the right side.</li>
</ul>
<p>As an example:</p>
<pre>
if ax < bx { @myFirstIf
;
; code to execute when 'ax < bx'
;
} else { @myFirstElse
;
; code to execute when 'ax >= bx'
;
}
</pre>
<p>Since the case where checking whether a number is even or odd, a
different form for the if is available, <code>if var is even</code> and
<code>if var is odd</code> will enter the if block if the specified
<b>var</b>iable is even or odd respectively.</p>
<h3 id="repeat-loop"><code>repeat</code> loop</h3>
<p>The <code>repeat</code> loop can be used to repeat a specific part of
the code a <i>n</i> amount of times. Its syntax is defined as follows:</p>
<pre>
repeat <i>n</i> with <i>var</i> {
;
; code to repeat <i>n</i> times
;
}
</pre>
<p>This statement can also be labeled. The reason why the <code>with</code>
part is explicit is because it implies that <code>var</code> will be used
to count down <i>n</i> times until <i>0</i> is reached.</p>
<p>The possible values for <b>n</b> are either a decimal number (can also
be expressed in either hexadecimal or binary, with the <i>h</i> or <i>b</i>
suffix, defaults to decimal), a register or a variable.</p>
<p>The possible values for <code>var</code> are either a register or a
variable. Certain optimizations will be made if the register used is <b>cx</b>:</p>
<pre>
; Multiplication with repeated sums
ax = 0
repeat 3 with cx { @myLoop
ax += 2
}
</pre>
<h3 id="while-loop"><code>while</code> loop</h3>
<p>The <code>while</code> loop can be used to repeat a specific part an
indefined amount of times, depending on an arbitrary condition. Once
this condition is not met, the loop will end:</p>
<pre>
while ax < 10 {
;
; code to repeat <i>while</i> ax < 10
;
}
</pre>
<p>The code will <b>not</b> be executed even once if the condition is
not met in the first place (<i>e.g.</i> ax turns out to be >= 10).
To solve this, it is possible to specify the loop to execute <b>at least
once</b> with the <code>or once</code> modifier:</p>
<pre>
ax = 9
while ax == 10 <b>or once</b> {
ax += 1
}
</pre>
<p>The first time the loop will enter thanks to the <code>or once</code>
part. In the loop, <i>ax</i> will become 10. Then the loop will repeat
because the condition is met, and exit once <i>ax</i> becomes 11.</p>
<h3 id="forever-loop"><code>forever</code> loop</h3>
<p>This loop can be used when one desires to execute a block of code
indefinitely. The <code>forever</code> loop proves extremely useful with
the <code>break</code> keyword, since it allows arbitrarily complex
conditions to be met before the loop is exited:</p>
<pre>
; Collatz conjecture starting at 7
ax = 7
forever {
printf "%d -> ", ax
if ax is even {
ax, dx = divmod ax, 2
} else {
ax *= 3
ax += 1
}
if ax == 1 {
break 2
}
}
put digit 1
</pre>
<h3 id="break-statement"><code>break</code> statement</h3>
<p>If one desires to <code>break</code> free from a block, it's possible
to do so with the <code>break</code> keyword, used to early terminate
a block:</p>
<pre>while ax < 10 {
ax += 1
<b>break</b>
ax -= 1 ; This line will never be executed
}
; Execution will continue here</pre>
<p>However, this is not very useful per se. For this reason, it's possible
to specify the amount of blocks one desires to break, by specifying the
number after the keyword:</p>
<pre>
<pre>while ax < 10 {
ax += 1
if ax > 7 {
break 2 ; This will break two levels
}
ax -= 1
}
; Execution will continue here on break</pre>
</pre>
<h2 id="functions">Functions</h2>
<p>Functions <b>must</b> be declared at the top of the file, since
their signature has to be known beforehand to determine whether a call
to them is legal or not.</p>
<p>Functions are defined by the <code>function</code> keyword, followed by
the function name to be used. After this, a list of comma separated
registers or variables must be specified insides parenthesis. Optionally,
one can use the <code>returns</code> keyword to determine where the
returned value will be passed. If omitted, the function will be considered
to return nothing. These can also be labeled.</p>
<pre>
function multiply(ax, bx) returns dx {
dx = 0
if ax < bx {
repeat ax with cx {
dx += bx
}
} else {
repeat bx with cx {
dx += ax
}
}
}
</pre>
<p>To call this function, one can first specify to which variable
or register the result will be assigned. This will be ilegal when the
function returns no values:</p>
<pre>
dx = multiply(7, 5)
</pre>
<p>Optimization will be made if the arguments passed match the
original parameters list, and so will the resulting assignment.
In the example above, no copying will be made from <i>dx</i> to <i>dx</i>
since they're the same. If one were to call:
<pre>
ax = 7
bx = 5
dx = multiply(bx, ax)
</pre>
<p>The value of <i>ax</i> would have to first get saved, then updated
with the right value, and then restoring the saved value into <i>bx</i>,
this is, <b>three</b> operations in contrast to no copying at all
(assuming that the right registers contain the right values, which
may not always be the case).</p>
<p>It's possible to pass a vector variable <i>by reference</i>,
by prepending <code>&</code> to the variable name. This also works
on assignments and any other kind of operations, since its address
is known at compile time.</p>
<h2 id="builtin-functions">Built-in functions</h2>
<p>Built-in functions are called in a similar to how user-defined functions
are, although they do <b>not</b> use parenthesis to separate the argument
list. Every of these are treated as a different statement.</p>
<h3 id="divmod"><code>divmod</code> function</h3>
<p>The <code>divmod</code> function can be used to perform both
<i>division</i> and the <i>modulo</i> operator (to obtain the remainder).
It will calculate both values on dividing the first argument by the second
one:</p>
<pre>
; Store (ax divided by bx) in ax
; Store (ax modulo bx) in dx
ax, dx = divmod ax, bx
</pre>
<p>Note that due how this function works and to guarantee that no data is
loss, certain parameters may perform worse than others. For instance,
calling the function with <i>dx, ax</i> and storing the result in
<i>dx, ax</i> too will cause <b>nine</b> generated instructions, while
invoking the function with <i>ax, bx</i> and storing the result in
<i>ax, dx</i> will cause <b>two</b>, due to how it works.
<br />
Some of these special worse cases are those which have the divisor in
either <i>ax</i> or <i>dx</i> and the result isn't stored on any register
or variable different to these, or when the result is neither <i>ax</i> nor
<i>dx</i>.</p>
<h2 id="basic-io">Basic IO</h2>
<p>It's possible to perform basic input/output operations using the console
as an interface with the user.</p>
<h3 id="printf"><code>printf</code> function</h3>
<p>This function allows to <b>print</b> a possibly <b>f</b>ormatted string
to the console. No formatting can be made when printing an already-defined
string variable, since special handling is performed to generate a
string which allows being formatted.</p>
<p>The parameters accepted by this function are either a single variable,
or an inmediate string, in which case it may contain <b>%X</b> where
<b>X</b> indicates the type of the variable to be formatted (either a
<b>s</b>tring or a <b>d</b>ecimal value), and after this, the arguments
to be formatted in a comma-separated fashion:</p>
<pre>string hello = "Hello, world!\r\n"
string person = "User"
short favourite = 42
printf hello
printf "Hello %s! I see you like the number %d", person, favourite
printf "\r\nExiting...\r\n"</pre>
<p>If two subsequent calls were to be made to <code>printf</code>,
it's recommended to use <code>printf("%s%s", str1, str2)</code> since
furhter optimization will be made.</p>
<h3 id="putchar"><code>putchar</code> function</h3>
<p>Used to <b>put</b> a single <b>char</b>acter (or digit) to the screen,
advancing the cursor position.</p>
<p>Either a variable or an inmediate value can be passed to the function:</p>
<pre>put char 'H'
put char 'I'
put char ' '
put digit cx</pre>
<h3 id="getchar"><code>getchar</code> function</h3>
<p>Used to <b>get</b> a single <b>char</b>acter (or digit) from the
keyboard input. In addition, the input can both be echoed (by default)
or hidden by appending <code>show</code> or <code>hide</code>:</p>
<pre>dl = get digit hide
dh = get digit hide
dl += dh
printf dl</pre>
<h3 id="setcursor"><code>setcursor</code> function</h3>
<p>This function allows to freely <b>set the cursor</b> on the screen.
The list of parameters to be supplied are, in this order, the <i>row</i>
and then the <i>column</i> to which the cursor should move:</p>
<pre>setcursor 12, 37
printf "Middle"</pre>
<p>Since the screen defaults to 80 columns and 25 rows, in this case,
<code>12, 37</code> would be able to print <code>"Middle"</code> on
the middle of the screen.</p>
<p>Since the screen defaults to 80 columns and 25 rows, in this case,
<code>12, 37</code> would be able to print <code>"Middle"</code> on
the middle of the screen.</p>
<h3 id="cls"><code>cls</code> function</h3>
<p>Used to <b>cl</b>ear the <b>s</b>creen. It takes no parameters.</p>
<h2 id="compiling">Compiling</h2>
<p>To compile <code>.lnn</code> files (containing the source code of your
program in Lonang), simply use the <code>./lnn</code> tool. As a basic
example, to compile the code in <code>main.lnn</code>:</p>
<pre>./lnn main.lnn</pre>
<h2 id="debugging">Debugging</h2>
<p>If one desires to debug the generated code, one can use the
<code>tag</code> statement, which will add a comment along with a
<code>nop</code> instructions so that the execution can be ran until it.
A message must always be specified, for instance:</p>
<pre>tag Starting the part which needs debugging</pre>
</div>
</body>
</html>