binary-introduction/assembly-language: Refactor files structure

Refactor files structure to match OpenEdu Methodology Signed-off-by: Rares Croicia <[email protected]>
security-summer-school · Jun 28, 2024 · bcc9241 · bcc9241
1 parent 4086e3d
commit bcc9241
Show file tree

Hide file tree

Showing 26 changed files with 600 additions and 597 deletions.
diff --git a/...y-introduction/assembly-language/drills/tasks/call-me-little-sunshine/README.md b/...y-introduction/assembly-language/drills/tasks/call-me-little-sunshine/README.md
@@ -0,0 +1,6 @@
+# Call Me Little Sunshine
+
+Do what the binary asks you to do.
+What, it doesn't work?
+
+If you're having difficulties solving this exercise, go through [this](../../../reading/reading-assembly.md#objdump) reading material.
diff --git a/...s/call-me-little-sunshine/sol/solution.sh → ...l-me-little-sunshine/solution/solution.sh b/...s/call-me-little-sunshine/sol/solution.sh → ...l-me-little-sunshine/solution/solution.sh
diff --git a/...rills/call-me-little-sunshine/public/main → ...asks/call-me-little-sunshine/support/main b/...rills/call-me-little-sunshine/public/main → ...asks/call-me-little-sunshine/support/main
diff --git a/chapters/binary-introduction/assembly-language/drills/tasks/crypto/README.md b/chapters/binary-introduction/assembly-language/drills/tasks/crypto/README.md
@@ -0,0 +1,5 @@
+# Crypto
+
+Is it really about crypto?
+
+If you're having difficulties solving this exercise, go through [this](../../../reading/reading-assembly.md#gdb) reading material.
diff --git a/...mbly-language/drills/crypto/sol/README.md → ...ge/drills/tasks/crypto/solution/README.md b/...mbly-language/drills/crypto/sol/README.md → ...ge/drills/tasks/crypto/solution/README.md
diff --git a/...mbly-language/drills/crypto/public/crypto → ...nguage/drills/tasks/crypto/support/crypto b/...mbly-language/drills/crypto/public/crypto → ...nguage/drills/tasks/crypto/support/crypto
diff --git a/.../binary-introduction/assembly-language/drills/tasks/gotta-link-em-all/README.md b/.../binary-introduction/assembly-language/drills/tasks/gotta-link-em-all/README.md
@@ -0,0 +1,5 @@
+# Gotta Link Em All
+
+I wonder what hides in all those object files...
+
+If you're having difficulties solving this exercise, go through [this](../../../reading/registers.md) reading material.
diff --git a/...age/drills/gotta-link-em-all/sol/main.asm → ...tasks/gotta-link-em-all/solution/main.asm b/...age/drills/gotta-link-em-all/sol/main.asm → ...tasks/gotta-link-em-all/solution/main.asm
diff --git a/...rills/gotta-link-em-all/public/.gitignore → ...asks/gotta-link-em-all/support/.gitignore b/...rills/gotta-link-em-all/public/.gitignore → ...asks/gotta-link-em-all/support/.gitignore
diff --git a/.../drills/gotta-link-em-all/public/Makefile → .../tasks/gotta-link-em-all/support/Makefile b/.../drills/gotta-link-em-all/public/Makefile → .../tasks/gotta-link-em-all/support/Makefile
diff --git a/.../drills/gotta-link-em-all/public/main.asm → .../tasks/gotta-link-em-all/support/main.asm b/.../drills/gotta-link-em-all/public/main.asm → .../tasks/gotta-link-em-all/support/main.asm
diff --git a/...ge/drills/gotta-link-em-all/public/run.sh → ...ls/tasks/gotta-link-em-all/support/run.sh b/...ge/drills/gotta-link-em-all/public/run.sh → ...ls/tasks/gotta-link-em-all/support/run.sh
diff --git a/.../binary-introduction/assembly-language/drills/tasks/in-plain-assembly/README.md b/.../binary-introduction/assembly-language/drills/tasks/in-plain-assembly/README.md
@@ -0,0 +1,5 @@
+# In Plain Assembly
+
+The flag is almost right there in your face.
+
+If you're having difficulties solving this exercise, go through [this](../../../reading/reading-assembly.md#gdb) reading material.
diff --git a/...ge/drills/in-plain-assembly/sol/README.md → ...asks/in-plain-assembly/solution/README.md b/...ge/drills/in-plain-assembly/sol/README.md → ...asks/in-plain-assembly/solution/README.md
diff --git a/...age/drills/in-plain-assembly/public/plain → ...lls/tasks/in-plain-assembly/support/plain b/...age/drills/in-plain-assembly/public/plain → ...lls/tasks/in-plain-assembly/support/plain
diff --git a/chapters/binary-introduction/assembly-language/drills/tasks/jump-maze/README.md b/chapters/binary-introduction/assembly-language/drills/tasks/jump-maze/README.md
@@ -0,0 +1,6 @@
+# Jump Maze
+
+Theseus has nothing on you!
+Navigate the maze and get the flag.
+
+If you're having difficulties solving this exercise, go through [this](../../../reading/assembly-instructions.md#jmp) reading material.
diff --git a/...y-language/drills/jump-maze/sol/README.md → ...drills/tasks/jump-maze/solution/README.md b/...y-language/drills/jump-maze/sol/README.md → ...drills/tasks/jump-maze/solution/README.md
diff --git a/chapters/binary-introduction/assembly-language/reading/README.md b/chapters/binary-introduction/assembly-language/reading/README.md
diff --git a/chapters/binary-introduction/assembly-language/reading/assembly-instructions.md b/chapters/binary-introduction/assembly-language/reading/assembly-instructions.md
@@ -0,0 +1,199 @@
+# Assembly Instructions
+
+We've now learned what assembly is theoretically and what registers are, but how do we use them?
+Each CPU exposes an **ISA (Instruction Set Architecture)**: a set of instructions with which to modify and interact with its registers and with the RAM.
+There are over 1000 instructions in the x64 ISA.
+There are even instructions for efficiently encrypting data.
+Find out more about them by enrolling in the [Hardware Assisted Security track](https://github.com/security-summer-school/hardware-sec/).
+
+Before we dive into the instructions themselves, it's useful to first look at their generic syntax:
+
+```asm
+instruction_name destination, source
+```
+
+Most Assembly instructions have 2 operands: a source and a destination.
+For some operations, such as arithmetic, the destination is also an operand.
+The result of each instruction is always stored in the destination.
+
+Below we'll list some fundamental instructions.
+We will be using the Intel Assembly syntax.
+
+## `mov`
+
+`mov` is the most basic instruction in Assembly.
+It _copies_ (or _moves_) data from the source to the destination.
+Also note that comments in Assembly are preceded by `;` and that the language is case-insensitive.
+
+```asm
+mov eax, 3              ; eax = 3
+
+mov rbx, "SSS Rulz"     ; place the string "SSS Rulz" in `rbx`
+; This places each byte of the string "SSS Rulz" in rbx.
+
+mov r8b, bh             ; r8b = bh
+; The sizes of the operands must be equal (1 byte each in this case).
+```
+
+## Data Manipulation
+
+Now that we've learnt how to place data in registers we need to learn how to do math with it.
+As you've seen so far, Assembly instructions are really simple.
+Below is a table with the most common and useful arithmetic instructions.
+Try to figure out what each example does.
+Use the fact that the general anatomy of an instruction is usually `instruction destination, source`.
+The result is always stored in the `destination`
+
+| Instruction          | Description     | Examples                          |
+|:--------------------:|:---------------:|:---------------------------------:|
+| `add <dest>, <src>`  | `dest += src`   | `add rbx, 5`<br/>`add r11, 0x99`   |
+| `sub <dest>, <src>`  | `dest -= src`   | `sub ecx, 'a'`<br/>`sub r9, r8`    |
+| `shl <dest>, <bits>` | `dest <<= bits` | `shl rax, 3`<br/>`shl rdi, cl`     |
+| `shr <dest>, <bits>` | `dest >>= bits` | `shr r15, 5`<br/>`shr rsi, cl`     |
+| `and <dest> <src>`   | `dest &= src`   | `and al, ah`<br/>`and bx, 13`      |
+| `or <dest> <src>`    | `dest \|= src`  | `or r10b, cl`<br/>`or r14, 0x2000` |
+| `xor <dest> <src>`   | `dest ^= src`   | `xor ebx, edx`<br/>`xor rcx, 1`    |
+| `inc <dest>`         | `dest++`        | `inc rsi`                         |
+| `dec <dest>`         | `dest--`        | `dec r10w`                        |
+
+## Control Flow
+
+Now we know how to do maths and move bits around.
+This is all good, but we still can't write full programs.
+We need a mechanism similar to `if`s from Python and also loops in order to make the code run based on conditions.
+
+### `jmp`
+
+The simplest instruction for control flow is the `jmp` instruction.
+It simply loads an address into the `rip` register.
+But when Assembly code is generated or written either by the compiler or by us, instructions don't have addresses yet.
+These addresses are assigned during the **linking** or **loading** phase, as you know from the [Application Lifetime session](../../Application%20Lifetime/).
+
+For this reason, we use **labels** as some sort of anchors.
+We `jmp` to them and then the assembler will replace them with relative addresses which are then replaced with full addresses during linking.
+The way in which `jmp` and labels function is very simple.
+Remember that in the absence of `jmp`s, Assembly code is executed linearly just like a script.
+
+```asm
+    jmp skip_next_section
+
+    ; Whatever code is here is never executed.
+
+skip_next_section:
+    ; Only the code below this label is executed.
+```
+
+> **Warning**
+> Do not confuse labels with functions.
+> A label does not stop the execution of code when it's reached.
+> They are simply ignored by anything except for `jmp`.
+
+For example, in the following code, both instructions are executed in the absence of `jmp`s:
+
+```asm
+    mov rax, 2
+some_label:
+    mov rbx, 3
+    ; rax = 2; rbx = 3
+```
+
+### `eflags`
+
+Each instruction (except for `mov`) changes the **inner state of the CPU**.
+In other words, several aspects regarding the result of the instruction are stored in a special register that we cannot access directly, called `eflags`.
+There are [instructions](https://stackoverflow.com/questions/1406783/how-to-read-and-write-x86-flags-registers-directly) that can set or clear some flags in `eflags`, but we cannot write something like `mov eflags, 2`.
+
+As its name implies, each bit in `eflags` is a flag that is activated (i.e. set to 1) if a certain condition is true about the result of the last executed instruction.
+We won't be using these flags per se with one exception: `ZF` - the **zero flag**.
+When active, it means that the result of the last instruction was... 0, duh!
+This is useful for testing if numbers are equal for example.
+We'll talk about this in the next section.
+
+### Conditional jumps
+
+Now we know that there is an internal state of the CPU which is modified by each instruction, except for `mov`.
+We still need a way to leverage this state.
+We can do this via **conditional jumps**.
+
+They are like `jmp` instructions, but the jump is made only when certain conditions are met.
+Otherwise, code execution continues from the next instruction.
+The general syntax of a conditional jump is
+
+```asm
+j[n]<cond> label
+```
+
+where the letter `n` is optional and means the jump will be made if the condition is **not** met.
+
+#### `cmp` and `test`
+
+We can use the regular arithmetic instructions that we've learned so far to modify `eflags`.
+But this has the drawback of also modifying our data.
+It would be great if we had a means to modify `eflags` without changing the data that we evaluate.
+We can do this using `cmp` and `test`.
+
+`cmp dest, src` modifies `eflags` as if you were **subtracting** `src` from `dst`, but without modifying `dst`.
+This is great for testing if 2 things are equal, or for testing which is greater or lower.
+
+`test dest, src` is similar to `cmp`, but modifies `eflags` according to the `and` instruction.
+This comes in handy when we want to check if a register is 0.
+
+```asm
+test rax, rax
+jz rax_is_zero
+```
+
+is equivalent to
+
+```asm
+cmp rax, 0
+jz rax_is_zero
+```
+
+Now let's have a look at some conditional jumps:
+
+| Conditional jump           | Meaning                                                       |
+|:--------------------------:|:-------------------------------------------------------------:|
+| `jz` / `je`                | Jump if the Zero Flag is active                               |
+| `jnz` / `jne`              | Jump if the Zero Flag is not active                           |
+| `cmp rax, rbx`<br/>`j[n]g`  | Jump if `rax` is (not) greater (signed) than `rbx`            |
+| `cmp rax, rbx`<br/>`j[n]a`  | Jump if `rax` is (not) greater (unsigned) than `rbx`          |
+| `cmp rax, rbx`<br/>`j[n]ge` | Jump if `rax` is (not) greater (signed) or equal than `rbx`   |
+| `cmp rax, rbx`<br/>`j[n]ae` | Jump if `rax` is (not) greater (unsigned) or equal than `rbx` |
+| `cmp rax, rbx`<br/>`j[n]l`  | Jump if `rax` is (not) lower (signed) than `rbx`              |
+| `cmp rax, rbx`<br/>`j[n]b`  | Jump if `rax` is (not) lower (unsigned) than `rbx`            |
+| `cmp rax, rbx`<br/>`j[n]le` | Jump if `rax` is (not) lower (signed) or equal than `rbx`     |
+| `cmp rax, rbx`<br/>`j[n]be` | Jump if `rax` is (not) lower (unsigned) or equal than `rbx`   |
+
+### Loops
+
+We can create loops simply by combining labels and conditional jumps.
+For example, `for i in range(0, 10)` from Python is equivalent to:
+
+```asm
+    xor rcx, rcx    ; i = rcx; same as mov rcx, 0
+for_loop:
+    cmp rcx, 10
+    je done_loop    ; verify i < 10
+
+    ; The body of the for loop.
+
+    inc rcx         ; rcx++
+    jmp for_loop    ; re-evaluate the condition
+
+done_loop:
+```
+
+Or alternatively, we can verify `rcx < 10` at the end of the loop:
+
+```asm
+    xor rcx, rcx
+for_loop:
+    ; The body of the for loop.
+
+    inc rcx         ; rcx++
+    cmp rcx, 10
+    jb for_loop    ; verify i < 10
+
+    ; The code here is executed only after the loop ends.
+```
diff --git a/chapters/binary-introduction/assembly-language/reading/dereferencing-addresses.md b/chapters/binary-introduction/assembly-language/reading/dereferencing-addresses.md
@@ -0,0 +1,74 @@
+# Dereferencing Addresses
+
+Up to this point we know how to operate with data and can write complex programs using conditional jumps.
+But we know that data is stored mostly in the RAM.
+How do we fetch it from there to our registers?
+
+Imagine the RAM is one giant array.
+Each byte is a cell in this array.
+Therefore, each byte is found at a given **index** in this array.
+Indices start at 0, so the first byte is found at index 0, the third 3 at index 2 and so on.
+These indices are also called **memory addresses**, or simply **addresses**.
+
+In order to load data from the RAM into our registers or vice-versa, we need to specify the CPU which RAM address to access.
+This is called **dereferencing that address**.
+Syntactically, this is very easy and is done by wrapping the address in `[]`.
+The address can be either a raw number, or a register, or an expression:
+
+```asm
+mov rax, [0xdeadbeef]   ; load 8 bytes from the address 0xdeadbeef into rax
+mov bx, [0xdeadbeef]    ; load 2 bytes from the address 0xdeadbeef into bx
+mov [0xdeadbeef], ecx   ; store 4 bytes from ecx at the address 0xdeadbeef
+```
+
+Notice that the number of bytes that are transferred between the RAM and registers is given by the size of the register.
+But what happens when we don't use a register?
+The code below is incorrect because it is impossible to tell how many bytes to use to write 0x69.
+We could write it using one byte of course, but what if we wanted to write it on 4 bytes and store `[ 0x00 | 0x00 | 0x00 | 0x69 ]`?
+To eliminate such ambiguities, we must specify the number of bytes that we want to write to the RAM:
+
+```asm
+mov [0xdeadbeef], byte 0x2      ; writes 1 byte
+mov [0xdeadbeef], word 0x2      ; writes 2 bytes: 0x00 and 0x02
+mov [0xdeadbeef], dword 0x2     ; writes 4 bytes
+mov [0xdeadbeef], qword 0x2     ; writes 8 bytes
+```
+
+Instead of a hardcoded value, we can express addresses as complex expressions which the CPU computes for us.
+In the snippet below, the CPU computes the address given by `rdi + rcx * 4` and then writes the contents of `edx` there.
+
+```asm
+mov [rdi + rcx * 4], edx
+```
+
+This is equivalent to `v[i] = something` where `v` is an array of 4-byte values (hence `rcx * 4`):
+
+- `rdi` = starting address of `v`
+- `rcx` = `i`
+- `edx` = `something`
+
+Therefore, whenever you see `[...]` in Assembly, what between the square brackets is being dereferenced [**with one exception**](further-reading.md#lea).
+
+## Endianness
+
+This is all nice, but how does all this look like in the memory?
+The order in which the bytes are stored in the RAM is called **endianness**.
+Most CPUs store bytes **in reverse order**, or **little endian** order, because the least significant byte is the first.
+When data is fetched back from the ram, the order is reversed:
+
+```asm
+mov [0x100], dword 0x12345678       ; the RAM at 0x100: [ 0x78 | 0x56 | 0x34 | 0x12 ]
+mov ax, [0x100]     ; ax = 0x5678
+mov bx, [101]       ; bx = 0x3456
+```
+
+However, endianness does not apply to strings.
+The code below writes the string `SSS Rulz` at the address 0x100.
+Notice we don't have to write it in reverse order like `zluR SSS`.
+
+```asm
+mov rax, "SSS Rulz"
+mov [0x100], rax
+; We need to use a register because mov cannot take both an address and a 64-bit immediate as operands.
+; https://www.felixcloutier.com/x86/mov
+```
diff --git a/chapters/binary-introduction/assembly-language/reading/further-reading.md b/chapters/binary-introduction/assembly-language/reading/further-reading.md
@@ -0,0 +1,65 @@
+# Further Reading
+
+## The Whole ISA
+
+If you want to search for an instruction, use [this](https://www.felixcloutier.com/x86/) website.
+Each instruction has its own table with all possible operands and what they do.
+Note that `imm8` means "8-bit immediate" (an 8-byte regular number), `imm64` means a 64-bit immediate and so on.
+Similarly, `reg32` means a 32-bit register and `m16` for example means a 16-bit (2-byte) memory area.
+You'll see `reg`, `imm` and `m` combined with `8`, `16`, `32` and `64` depending on what each instruction does.
+
+## Caches
+
+Many programs access the same addresses repeatedly over a short period of time.
+Take a short 1000-step loop.
+It uses the same code 1000 times.
+It would be inefficient for the CPU to read the instructions directly from the RAM 1000 times.
+For this reason, there is an intermediary level of memory between the RAM and the registers, called **the cache**.
+
+As their name implies, caches store the contents of some memory addresses that are frequently requested by the CPU.
+We say _caches_, in plural because they are laid out hierarchically, each lower level being faster and smaller than the ones below.
+Usually, CPUs have 3 levels of cache memory.
+You can query their sizes with the `lscpu` command:
+
+```console
+root@kali:~$ lscpu
+[...]
+L1d cache:                       128 KiB
+L1i cache:                       128 KiB
+L2 cache:                        1 MiB
+L3 cache:                        6 MiB
+[...]
+```
+
+Notice the L1 (level 1) cache is split between a data cache (`L1d`) and an instruction cache `L1i`.
+The other caches do not store data and instructions separately.
+
+## Assembly Syntaxes
+
+This session we've used the Intel syntax for writing and displaying Assembly.
+We did so because it's more straightforward than its alternative: the AT&T syntax.
+You can find the differences on [Wikipedia](https://en.wikipedia.org/wiki/X86_assembly_language#Syntax).
+
+## `lea`
+
+`lea` stands for "Load Effective Address".
+Its syntax is:
+
+```asm
+lea dest, [address]
+```
+
+It loads `address` into the `dest` register (it can only be a register).
+What's interesting about it is that it also uses the `[...]` syntax, but **does not dereference the address**.
+In the snippet below, `0xdeadbeef` is simply copied to `rax`.
+
+```asm
+lea rax, [0xdeadbeef]
+```
+
+Its true power comes from the fact that it can also compute an address.
+For example, the code below will first compute the address given by `rdi + rcx * 8 + 7` and then write this address into `rax`.
+
+```asm
+lea rax, [rdi + rcx * 8 + 7]
+```