-
Notifications
You must be signed in to change notification settings - Fork 13
Decompiling a function
GUIDE IN PROGRESS
Function decompilation is one of the core areas of a decomp project. In the ROM, functions are encoded as ARM (or THUMB) assembly code, and the goal is to transform this assembly into C code that produces the exact same lines of assembly when compiled. C code is easier to read and modify than assembly, making a decompiled C function easier to hack or research with.
To decompile a function, you need to know both ARM assembly and C. It is also helpful (but not required) to use a reverse engineering tool like Ghidra or IDA.
If you are not familiar with ARM assembly or Ghidra, you can check out Reverse Engineering a DS Game for a primer on reverse engineering, including steps to set up Ghidra with EoS symbols and an introduction to reading ARM assembly. You can also look at Whirlwind Tour of ARM Assembly for a more thorough ARM assembly reference.
The first order of business is to pick a function to decompile. The functions are located in .s
files within the asm
directory, surrounded by an arm_func_start
and arm_func_end
(or their THUMB equivalents). As for which function to pick, this is up to you: perhaps you are new to function decomp and want a small function to ease into the process, or you are a hacker who wants to decomp a specific function to edit that function as C code instead of assembly, or you don't mind either way and just pick the first function in a file.
Once you pick a function, you'll need a workflow where you can write some C code, compile it to assembly, and compare the compiled assembly code with the original assembly code to see if they match. A common website for this is decomp.me.
Click "Start decomping" to begin setting up a function decomp environment ("scratch"). You can optionally sign into your GitHub account in decomp.me to keep track of all scratches you've created.
You'll be prompted to create a new scratch by filling in a couple of fields.
- Choose the DS (ARMv5TE) platform and the Pokémon HeartGold/SoulSilver preset (which EoS also uses currently). This will set up the compiler and flags to match the compiler used by the EoS decomp.
- In "Diff label", enter the name of the function you plan to decomp.
- In "Target assembly", place the entire function from the
.s
file, including thearm_func_start
andarm_func_end
. - "Context" can contain definitions such as typedefs, structs, enums, and
extern
functions. It is technically not required, but using it will keep function source code clean when working on the scratch. You can grab a default context from nitro/types.h; exclude all the#ifdef
s and take the typedefs along with the#define
s forTRUE
,FALSE
, andNULL
.
If you filled in all fields correctly, "Create scratch" will create the scratch. The creation may fail if there are errors parsing the target assembly code, in which case you should review the parsing errors and the above instructions to see what went wrong. When the scratch is created, you'll be taken to the screen below.
You'll see an empty C function on the left and the assembly comparison on the right, including the target assembly you inputted during setup. With an empty C function, the compiled (current) assembly is only a bx lr
to return from the function.
At this point, you can begin decompiling the function. A common approach is to start with the output from an automated decompiler, like the ones in Ghidra or IDA, and clean up the code from there. Alternatively, you can write C code from scratch by looking at the target assembly. If you haven't decompiled before, I recommend starting from scratch to learn the function decompiling process. You can then try using an automated decompiler on later functions to see if you prefer this approach.
The function can be cleaned up a bit. For example, the param_1 == 0x0
would be clearer as param_1 == NULL
. If you know what the function does in the context of game functionality, you can clean up further by naming variables. Cleanup is not strictly required, but it will help people who are reading the function you decomped.
If you haven't already set up Ghidra, follow this guide to do so. Once Ghidra is set up, choose the overlay of the function you're decompiling and find the function within the overlay. Copy the decompiler output into decomp.me as a starting point.
Decompiled function in Ghidra
decomp.me with the Ghidra decompiler's output
This function is already labeled in Ghidra because of previous reverse engineering work done with pmdsky-debug. This knowledge can be helpful if present, but it won't always be there.
Ghidra uses primitive C types, but the decomp uses custom typedefs for its types, so the primitive types should be converted to the custom types. For example, int
becomes s32
and bool
becomes u8
. Also, use the macros FALSE
and TRUE
for booleans instead of false
and true
. Here's what the function looks like after cleaning up these types and macros.
The function now compiles successfully, but the compiled assembly does not the target assembly. In the vast majority of cases, the automated decompiler will not produce matching output. You'll have to read the target assembly and the mismatches and see what changes can be made to the C code to possibly produce a match.
Breaking down the diff, the target assembly has the following:
cmp r0, #0
moveq r0, #0
bxeq lr
If r0 is 0, it is assigned to 0 as a return value, and the function exits.
Meanwhile, the current assembly has the following instead:
cmp r0, #0
beq 20
...
20: mov r0, #0
The logic is the same, but the mov r0, #0
operation is at the end of the function instead of right after the cmp
. The two branches in this function (return *param_1 != 0
and return FALSE
) are swapped in the assembly.
One way to change the compiled assembly is to flip the branches in the C code. Instead of this:
if (param_1 != (s32 *)0x0) {
return *param_1 != 0;
}
return FALSE;
Invert the if
statement and swap the branching logic accordingly:
if (param_1 == (s32 *)0x0) {
return FALSE;
}
return *param_1 != 0;
That did the trick! The compiled and target assembly are now matching.
Note that not all functions will be this simple to match with automated decompiler output. Longer functions and more complicated logic give automated decompilers more trouble, and will take more tweaks and possibly large refactors to match. Some people prefer to avoid automated decompilers and stick to writing the function from scratch, and it is up to you to decide which approach you prefer.
Now that the function has been decompiled, you'll need to add it into the decomp project and remove the corresponding raw assembly code.
- Create a
.c
in thesrc
folder and a corresponding.h
file ininclude
.- Alternatively, if the function is at the beginning or end of the
.s
file, you may be able to add it to an existing C file. Checkmain.lsf
to see which C (.o
) file is right before/after the.s
file.
- Alternatively, if the function is at the beginning or end of the
- Add the decompiled function to the new
.c
file, along with its corresponding header in the.h
file. - Search for any externs in other files that reference the newly decompiled function. These externs can be removed and replaced with an
#include
to the new.h
file. - Remove the function's assembly code from the
.s
file. - Split the
.s
file in two at the location where the function's assembly code was. The new.s
file should be named according the the offset of the first function in the new file (e.g.,overlay_29_022E0378.s
). If the function you decompiled was at the beginning or end of the file, you can skip this step and step 6. - Split the corresponding
.inc
file (inasm/include
) to the.s
file you split. - Find the split file in
main.lsf
and add two files after it: the corresponding.o
to the new.c
file, and the newly split off.s
file. - Run
make tidy
andmake
to ensure that the project compiles and produces a matching ROM. If the ROM doesn't match, you can compare the mismatched files with the asmdiff tool or a hex editor to troubleshoot the issue. - If you want to decompile more functions, repeat the decomp process by finding a new function and creating a new scratch. If you are done, make a PR to the main
pmd-sky
repo.