forked from open-education-hub/essentials-security
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
binary-introduction/assembly-language: Refactor files structure
Refactor files structure to match OpenEdu Methodology Signed-off-by: Rares Croicia <[email protected]>
- Loading branch information
1 parent
4086e3d
commit bcc9241
Showing
26 changed files
with
600 additions
and
597 deletions.
There are no files selected for viewing
6 changes: 6 additions & 0 deletions
6
...y-introduction/assembly-language/drills/tasks/call-me-little-sunshine/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Call Me Little Sunshine | ||
|
||
Do what the binary asks you to do. | ||
What, it doesn't work? | ||
|
||
If you're having difficulties solving this exercise, go through [this](../../../reading/reading-assembly.md#objdump) reading material. |
File renamed without changes.
File renamed without changes.
5 changes: 5 additions & 0 deletions
5
chapters/binary-introduction/assembly-language/drills/tasks/crypto/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Crypto | ||
|
||
Is it really about crypto? | ||
|
||
If you're having difficulties solving this exercise, go through [this](../../../reading/reading-assembly.md#gdb) reading material. |
File renamed without changes.
File renamed without changes.
5 changes: 5 additions & 0 deletions
5
.../binary-introduction/assembly-language/drills/tasks/gotta-link-em-all/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Gotta Link Em All | ||
|
||
I wonder what hides in all those object files... | ||
|
||
If you're having difficulties solving this exercise, go through [this](../../../reading/registers.md) reading material. |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
5 changes: 5 additions & 0 deletions
5
.../binary-introduction/assembly-language/drills/tasks/in-plain-assembly/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# In Plain Assembly | ||
|
||
The flag is almost right there in your face. | ||
|
||
If you're having difficulties solving this exercise, go through [this](../../../reading/reading-assembly.md#gdb) reading material. |
File renamed without changes.
File renamed without changes.
6 changes: 6 additions & 0 deletions
6
chapters/binary-introduction/assembly-language/drills/tasks/jump-maze/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Jump Maze | ||
|
||
Theseus has nothing on you! | ||
Navigate the maze and get the flag. | ||
|
||
If you're having difficulties solving this exercise, go through [this](../../../reading/assembly-instructions.md#jmp) reading material. |
File renamed without changes.
596 changes: 0 additions & 596 deletions
596
chapters/binary-introduction/assembly-language/reading/README.md
This file was deleted.
Oops, something went wrong.
199 changes: 199 additions & 0 deletions
199
chapters/binary-introduction/assembly-language/reading/assembly-instructions.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
# Assembly Instructions | ||
|
||
We've now learned what assembly is theoretically and what registers are, but how do we use them? | ||
Each CPU exposes an **ISA (Instruction Set Architecture)**: a set of instructions with which to modify and interact with its registers and with the RAM. | ||
There are over 1000 instructions in the x64 ISA. | ||
There are even instructions for efficiently encrypting data. | ||
Find out more about them by enrolling in the [Hardware Assisted Security track](https://github.com/security-summer-school/hardware-sec/). | ||
|
||
Before we dive into the instructions themselves, it's useful to first look at their generic syntax: | ||
|
||
```asm | ||
instruction_name destination, source | ||
``` | ||
|
||
Most Assembly instructions have 2 operands: a source and a destination. | ||
For some operations, such as arithmetic, the destination is also an operand. | ||
The result of each instruction is always stored in the destination. | ||
|
||
Below we'll list some fundamental instructions. | ||
We will be using the Intel Assembly syntax. | ||
|
||
## `mov` | ||
|
||
`mov` is the most basic instruction in Assembly. | ||
It _copies_ (or _moves_) data from the source to the destination. | ||
Also note that comments in Assembly are preceded by `;` and that the language is case-insensitive. | ||
|
||
```asm | ||
mov eax, 3 ; eax = 3 | ||
mov rbx, "SSS Rulz" ; place the string "SSS Rulz" in `rbx` | ||
; This places each byte of the string "SSS Rulz" in rbx. | ||
mov r8b, bh ; r8b = bh | ||
; The sizes of the operands must be equal (1 byte each in this case). | ||
``` | ||
|
||
## Data Manipulation | ||
|
||
Now that we've learnt how to place data in registers we need to learn how to do math with it. | ||
As you've seen so far, Assembly instructions are really simple. | ||
Below is a table with the most common and useful arithmetic instructions. | ||
Try to figure out what each example does. | ||
Use the fact that the general anatomy of an instruction is usually `instruction destination, source`. | ||
The result is always stored in the `destination` | ||
|
||
| Instruction | Description | Examples | | ||
|:--------------------:|:---------------:|:---------------------------------:| | ||
| `add <dest>, <src>` | `dest += src` | `add rbx, 5`<br/>`add r11, 0x99` | | ||
| `sub <dest>, <src>` | `dest -= src` | `sub ecx, 'a'`<br/>`sub r9, r8` | | ||
| `shl <dest>, <bits>` | `dest <<= bits` | `shl rax, 3`<br/>`shl rdi, cl` | | ||
| `shr <dest>, <bits>` | `dest >>= bits` | `shr r15, 5`<br/>`shr rsi, cl` | | ||
| `and <dest> <src>` | `dest &= src` | `and al, ah`<br/>`and bx, 13` | | ||
| `or <dest> <src>` | `dest \|= src` | `or r10b, cl`<br/>`or r14, 0x2000` | | ||
| `xor <dest> <src>` | `dest ^= src` | `xor ebx, edx`<br/>`xor rcx, 1` | | ||
| `inc <dest>` | `dest++` | `inc rsi` | | ||
| `dec <dest>` | `dest--` | `dec r10w` | | ||
|
||
## Control Flow | ||
|
||
Now we know how to do maths and move bits around. | ||
This is all good, but we still can't write full programs. | ||
We need a mechanism similar to `if`s from Python and also loops in order to make the code run based on conditions. | ||
|
||
### `jmp` | ||
|
||
The simplest instruction for control flow is the `jmp` instruction. | ||
It simply loads an address into the `rip` register. | ||
But when Assembly code is generated or written either by the compiler or by us, instructions don't have addresses yet. | ||
These addresses are assigned during the **linking** or **loading** phase, as you know from the [Application Lifetime session](../../Application%20Lifetime/). | ||
|
||
For this reason, we use **labels** as some sort of anchors. | ||
We `jmp` to them and then the assembler will replace them with relative addresses which are then replaced with full addresses during linking. | ||
The way in which `jmp` and labels function is very simple. | ||
Remember that in the absence of `jmp`s, Assembly code is executed linearly just like a script. | ||
|
||
```asm | ||
jmp skip_next_section | ||
; Whatever code is here is never executed. | ||
skip_next_section: | ||
; Only the code below this label is executed. | ||
``` | ||
|
||
> **Warning** | ||
> Do not confuse labels with functions. | ||
> A label does not stop the execution of code when it's reached. | ||
> They are simply ignored by anything except for `jmp`. | ||
For example, in the following code, both instructions are executed in the absence of `jmp`s: | ||
|
||
```asm | ||
mov rax, 2 | ||
some_label: | ||
mov rbx, 3 | ||
; rax = 2; rbx = 3 | ||
``` | ||
|
||
### `eflags` | ||
|
||
Each instruction (except for `mov`) changes the **inner state of the CPU**. | ||
In other words, several aspects regarding the result of the instruction are stored in a special register that we cannot access directly, called `eflags`. | ||
There are [instructions](https://stackoverflow.com/questions/1406783/how-to-read-and-write-x86-flags-registers-directly) that can set or clear some flags in `eflags`, but we cannot write something like `mov eflags, 2`. | ||
|
||
As its name implies, each bit in `eflags` is a flag that is activated (i.e. set to 1) if a certain condition is true about the result of the last executed instruction. | ||
We won't be using these flags per se with one exception: `ZF` - the **zero flag**. | ||
When active, it means that the result of the last instruction was... 0, duh! | ||
This is useful for testing if numbers are equal for example. | ||
We'll talk about this in the next section. | ||
|
||
### Conditional jumps | ||
|
||
Now we know that there is an internal state of the CPU which is modified by each instruction, except for `mov`. | ||
We still need a way to leverage this state. | ||
We can do this via **conditional jumps**. | ||
|
||
They are like `jmp` instructions, but the jump is made only when certain conditions are met. | ||
Otherwise, code execution continues from the next instruction. | ||
The general syntax of a conditional jump is | ||
|
||
```asm | ||
j[n]<cond> label | ||
``` | ||
|
||
where the letter `n` is optional and means the jump will be made if the condition is **not** met. | ||
|
||
#### `cmp` and `test` | ||
|
||
We can use the regular arithmetic instructions that we've learned so far to modify `eflags`. | ||
But this has the drawback of also modifying our data. | ||
It would be great if we had a means to modify `eflags` without changing the data that we evaluate. | ||
We can do this using `cmp` and `test`. | ||
|
||
`cmp dest, src` modifies `eflags` as if you were **subtracting** `src` from `dst`, but without modifying `dst`. | ||
This is great for testing if 2 things are equal, or for testing which is greater or lower. | ||
|
||
`test dest, src` is similar to `cmp`, but modifies `eflags` according to the `and` instruction. | ||
This comes in handy when we want to check if a register is 0. | ||
|
||
```asm | ||
test rax, rax | ||
jz rax_is_zero | ||
``` | ||
|
||
is equivalent to | ||
|
||
```asm | ||
cmp rax, 0 | ||
jz rax_is_zero | ||
``` | ||
|
||
Now let's have a look at some conditional jumps: | ||
|
||
| Conditional jump | Meaning | | ||
|:--------------------------:|:-------------------------------------------------------------:| | ||
| `jz` / `je` | Jump if the Zero Flag is active | | ||
| `jnz` / `jne` | Jump if the Zero Flag is not active | | ||
| `cmp rax, rbx`<br/>`j[n]g` | Jump if `rax` is (not) greater (signed) than `rbx` | | ||
| `cmp rax, rbx`<br/>`j[n]a` | Jump if `rax` is (not) greater (unsigned) than `rbx` | | ||
| `cmp rax, rbx`<br/>`j[n]ge` | Jump if `rax` is (not) greater (signed) or equal than `rbx` | | ||
| `cmp rax, rbx`<br/>`j[n]ae` | Jump if `rax` is (not) greater (unsigned) or equal than `rbx` | | ||
| `cmp rax, rbx`<br/>`j[n]l` | Jump if `rax` is (not) lower (signed) than `rbx` | | ||
| `cmp rax, rbx`<br/>`j[n]b` | Jump if `rax` is (not) lower (unsigned) than `rbx` | | ||
| `cmp rax, rbx`<br/>`j[n]le` | Jump if `rax` is (not) lower (signed) or equal than `rbx` | | ||
| `cmp rax, rbx`<br/>`j[n]be` | Jump if `rax` is (not) lower (unsigned) or equal than `rbx` | | ||
|
||
### Loops | ||
|
||
We can create loops simply by combining labels and conditional jumps. | ||
For example, `for i in range(0, 10)` from Python is equivalent to: | ||
|
||
```asm | ||
xor rcx, rcx ; i = rcx; same as mov rcx, 0 | ||
for_loop: | ||
cmp rcx, 10 | ||
je done_loop ; verify i < 10 | ||
; The body of the for loop. | ||
inc rcx ; rcx++ | ||
jmp for_loop ; re-evaluate the condition | ||
done_loop: | ||
``` | ||
|
||
Or alternatively, we can verify `rcx < 10` at the end of the loop: | ||
|
||
```asm | ||
xor rcx, rcx | ||
for_loop: | ||
; The body of the for loop. | ||
inc rcx ; rcx++ | ||
cmp rcx, 10 | ||
jb for_loop ; verify i < 10 | ||
; The code here is executed only after the loop ends. | ||
``` |
74 changes: 74 additions & 0 deletions
74
chapters/binary-introduction/assembly-language/reading/dereferencing-addresses.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Dereferencing Addresses | ||
|
||
Up to this point we know how to operate with data and can write complex programs using conditional jumps. | ||
But we know that data is stored mostly in the RAM. | ||
How do we fetch it from there to our registers? | ||
|
||
Imagine the RAM is one giant array. | ||
Each byte is a cell in this array. | ||
Therefore, each byte is found at a given **index** in this array. | ||
Indices start at 0, so the first byte is found at index 0, the third 3 at index 2 and so on. | ||
These indices are also called **memory addresses**, or simply **addresses**. | ||
|
||
In order to load data from the RAM into our registers or vice-versa, we need to specify the CPU which RAM address to access. | ||
This is called **dereferencing that address**. | ||
Syntactically, this is very easy and is done by wrapping the address in `[]`. | ||
The address can be either a raw number, or a register, or an expression: | ||
|
||
```asm | ||
mov rax, [0xdeadbeef] ; load 8 bytes from the address 0xdeadbeef into rax | ||
mov bx, [0xdeadbeef] ; load 2 bytes from the address 0xdeadbeef into bx | ||
mov [0xdeadbeef], ecx ; store 4 bytes from ecx at the address 0xdeadbeef | ||
``` | ||
|
||
Notice that the number of bytes that are transferred between the RAM and registers is given by the size of the register. | ||
But what happens when we don't use a register? | ||
The code below is incorrect because it is impossible to tell how many bytes to use to write 0x69. | ||
We could write it using one byte of course, but what if we wanted to write it on 4 bytes and store `[ 0x00 | 0x00 | 0x00 | 0x69 ]`? | ||
To eliminate such ambiguities, we must specify the number of bytes that we want to write to the RAM: | ||
|
||
```asm | ||
mov [0xdeadbeef], byte 0x2 ; writes 1 byte | ||
mov [0xdeadbeef], word 0x2 ; writes 2 bytes: 0x00 and 0x02 | ||
mov [0xdeadbeef], dword 0x2 ; writes 4 bytes | ||
mov [0xdeadbeef], qword 0x2 ; writes 8 bytes | ||
``` | ||
|
||
Instead of a hardcoded value, we can express addresses as complex expressions which the CPU computes for us. | ||
In the snippet below, the CPU computes the address given by `rdi + rcx * 4` and then writes the contents of `edx` there. | ||
|
||
```asm | ||
mov [rdi + rcx * 4], edx | ||
``` | ||
|
||
This is equivalent to `v[i] = something` where `v` is an array of 4-byte values (hence `rcx * 4`): | ||
|
||
- `rdi` = starting address of `v` | ||
- `rcx` = `i` | ||
- `edx` = `something` | ||
|
||
Therefore, whenever you see `[...]` in Assembly, what between the square brackets is being dereferenced [**with one exception**](further-reading.md#lea). | ||
|
||
## Endianness | ||
|
||
This is all nice, but how does all this look like in the memory? | ||
The order in which the bytes are stored in the RAM is called **endianness**. | ||
Most CPUs store bytes **in reverse order**, or **little endian** order, because the least significant byte is the first. | ||
When data is fetched back from the ram, the order is reversed: | ||
|
||
```asm | ||
mov [0x100], dword 0x12345678 ; the RAM at 0x100: [ 0x78 | 0x56 | 0x34 | 0x12 ] | ||
mov ax, [0x100] ; ax = 0x5678 | ||
mov bx, [101] ; bx = 0x3456 | ||
``` | ||
|
||
However, endianness does not apply to strings. | ||
The code below writes the string `SSS Rulz` at the address 0x100. | ||
Notice we don't have to write it in reverse order like `zluR SSS`. | ||
|
||
```asm | ||
mov rax, "SSS Rulz" | ||
mov [0x100], rax | ||
; We need to use a register because mov cannot take both an address and a 64-bit immediate as operands. | ||
; https://www.felixcloutier.com/x86/mov | ||
``` |
65 changes: 65 additions & 0 deletions
65
chapters/binary-introduction/assembly-language/reading/further-reading.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Further Reading | ||
|
||
## The Whole ISA | ||
|
||
If you want to search for an instruction, use [this](https://www.felixcloutier.com/x86/) website. | ||
Each instruction has its own table with all possible operands and what they do. | ||
Note that `imm8` means "8-bit immediate" (an 8-byte regular number), `imm64` means a 64-bit immediate and so on. | ||
Similarly, `reg32` means a 32-bit register and `m16` for example means a 16-bit (2-byte) memory area. | ||
You'll see `reg`, `imm` and `m` combined with `8`, `16`, `32` and `64` depending on what each instruction does. | ||
|
||
## Caches | ||
|
||
Many programs access the same addresses repeatedly over a short period of time. | ||
Take a short 1000-step loop. | ||
It uses the same code 1000 times. | ||
It would be inefficient for the CPU to read the instructions directly from the RAM 1000 times. | ||
For this reason, there is an intermediary level of memory between the RAM and the registers, called **the cache**. | ||
|
||
As their name implies, caches store the contents of some memory addresses that are frequently requested by the CPU. | ||
We say _caches_, in plural because they are laid out hierarchically, each lower level being faster and smaller than the ones below. | ||
Usually, CPUs have 3 levels of cache memory. | ||
You can query their sizes with the `lscpu` command: | ||
|
||
```console | ||
root@kali:~$ lscpu | ||
[...] | ||
L1d cache: 128 KiB | ||
L1i cache: 128 KiB | ||
L2 cache: 1 MiB | ||
L3 cache: 6 MiB | ||
[...] | ||
``` | ||
|
||
Notice the L1 (level 1) cache is split between a data cache (`L1d`) and an instruction cache `L1i`. | ||
The other caches do not store data and instructions separately. | ||
|
||
## Assembly Syntaxes | ||
|
||
This session we've used the Intel syntax for writing and displaying Assembly. | ||
We did so because it's more straightforward than its alternative: the AT&T syntax. | ||
You can find the differences on [Wikipedia](https://en.wikipedia.org/wiki/X86_assembly_language#Syntax). | ||
|
||
## `lea` | ||
|
||
`lea` stands for "Load Effective Address". | ||
Its syntax is: | ||
|
||
```asm | ||
lea dest, [address] | ||
``` | ||
|
||
It loads `address` into the `dest` register (it can only be a register). | ||
What's interesting about it is that it also uses the `[...]` syntax, but **does not dereference the address**. | ||
In the snippet below, `0xdeadbeef` is simply copied to `rax`. | ||
|
||
```asm | ||
lea rax, [0xdeadbeef] | ||
``` | ||
|
||
Its true power comes from the fact that it can also compute an address. | ||
For example, the code below will first compute the address given by `rdi + rcx * 8 + 7` and then write this address into `rax`. | ||
|
||
```asm | ||
lea rax, [rdi + rcx * 8 + 7] | ||
``` |
Oops, something went wrong.