Skip to content

Decompiling a function

AnonymousRandomPerson edited this page Aug 24, 2023 · 19 revisions

GUIDE IN PROGRESS

Function decompilation is one of the core areas of a decomp project. In the ROM, functions are encoded as ARM (or THUMB) assembly code, and the goal is to transform this assembly into C code that produces the exact same lines of assembly when compiled. C code is easier to read and modify than assembly, making a decompiled C function easier to hack or research with.

To decompile a function, you need to know both ARM assembly and C. It is also helpful (but not required) to use a reverse engineering tool like Ghidra or IDA.

If you are not familiar with ARM assembly or Ghidra, you can check out Reverse Engineering a DS Game for a primer on reverse engineering, including steps to set up Ghidra with EoS symbols and an introduction to reading ARM assembly. You can also look at Whirlwind Tour of ARM Assembly for a more thorough ARM assembly reference.

Setting up a function for decomp

The first order of business is to pick a function to decompile. The functions are located in .s files within the asm directory, surrounded by an arm_func_start and arm_func_end (or their THUMB equivalents). As for which function to pick, this is up to you: perhaps you are new to function decomp and want a small function to ease into the process, or you are a hacker who wants to decomp a specific function to edit that function as C code instead of assembly, or you don't mind either way and just pick the first function in a file.

Once you pick a function, you'll need a workflow where you can write some C code, compile it to assembly, and compare the compiled assembly code with the original assembly code to see if they match. A common website for this is decomp.me.

Screenshot 2023-08-22 at 10 41 38 PM

Click "Start decomping" to begin setting up a function decomp environment ("scratch"). You can optionally sign into your GitHub account in decomp.me to keep track of all scratches you've created.

You'll be prompted to create a new scratch by filling in a couple of fields.

  • Choose the DS (ARMv5TE) platform and the Pokémon HeartGold/SoulSilver preset (which EoS also uses currently). This will set up the compiler and flags to match the compiler used by the EoS decomp.
  • In "Diff label", enter the name of the function you plan to decomp.
  • In "Target assembly", place the entire function from the .s file, including the arm_func_start and arm_func_end.
  • "Context" can contain definitions such as typedefs, structs, enums, and extern functions. It is technically not required, but using it will keep function source code clean when working on the scratch. You can grab a default context from nitro/types.h; exclude all the #ifdefs and take the typedefs along with the #defines for TRUE, FALSE, and NULL.

Screenshot 2023-08-23 at 10 35 06 PM

If you filled in all fields correctly, "Create scratch" will create the scratch. The creation may fail if there are errors parsing the target assembly code, in which case you should review the parsing errors and the above instructions to see what went wrong. When the scratch is created, you'll be taken to the screen below.

Screenshot 2023-08-23 at 10 36 05 PM

You'll see an empty C function on the left and the assembly comparison on the right, including the target assembly you inputted during setup. With an empty C function, the compiled (current) assembly is only a bx lr to return from the function.

Decompiling the function

At this point, you can begin decompiling the function. A common approach is to start with the output from an automated decompiler, like the ones in Ghidra or IDA, and clean up the code from there. Alternatively, you can write C code from scratch by looking at the target assembly. If you haven't decompiled before, I recommend starting from scratch to learn the function decompiling process. You can then try using an automated decompiler on later functions to see if you prefer this approach.

Decompiling a function from scratch

The function can be cleaned up a bit. For example, the param_1 == 0x0 would be clearer as param_1 == NULL. If you know what the function does in the context of game functionality, you can clean up further by naming variables. Cleanup is not strictly required, but it will help people who are reading the function you decomped.

Screenshot 2023-08-23 at 11 03 23 PM

Starting with automated decompiler output

If you haven't already set up Ghidra, follow this guide to do so. Once Ghidra is set up, choose the overlay of the function you're decompiling and find the function within the overlay. Copy the decompiler output into decomp.me as a starting point.

Screenshot 2023-08-22 at 11 12 53 PM

Decompiled function in Ghidra

Screenshot 2023-08-23 at 10 46 59 PM

decomp.me with the Ghidra decompiler's output

This function is already labeled in Ghidra because of previous reverse engineering work done with pmdsky-debug. This knowledge can be helpful if present, but it won't always be there.

Ghidra uses primitive C types, but the decomp uses custom typedefs for its types, so the primitive types should be converted to the custom types. For example, int becomes s32 and bool becomes u8. Also, use the macros FALSE and TRUE for booleans instead of false and true. Here's what the function looks like after cleaning up these types and macros.

Screenshot 2023-08-23 at 10 48 21 PM

The function now compiles successfully, but the compiled assembly does not the target assembly. In the vast majority of cases, the automated decompiler will not produce matching output. You'll have to read the target assembly and the mismatches and see what changes can be made to the C code to possibly produce a match.

Breaking down the diff, the target assembly has the following:

cmp r0, #0
moveq r0, #0
bxeq lr

If r0 is 0, it is assigned to 0 as a return value, and the function exits.

Meanwhile, the current assembly has the following instead:

cmp r0, #0
beq 20
...
20: mov r0, #0

The logic is the same, but the mov r0, #0 operation is at the end of the function instead of right after the cmp. The two branches in this function (return *param_1 != 0 and return FALSE) are swapped in the assembly.

One way to change the compiled assembly is to flip the branches in the C code. Instead of this:

  if (param_1 != (s32 *)0x0) {
    return *param_1 != 0;
  }
  return FALSE;

Invert the if statement and swap the branching logic accordingly:

  if (param_1 == (s32 *)0x0) {
    return FALSE;
  }
  return *param_1 != 0;

Screenshot 2023-08-23 at 10 59 37 PM

That did the trick! The compiled and target assembly are now matching.

Note that not all functions will be this simple to match with automated decompiler output. Longer functions and more complicated logic give automated decompilers more trouble, and will take more tweaks and possibly large refactors to match. Some people prefer to avoid automated decompilers and stick to writing the function from scratch, and it is up to you to decide which approach you prefer.

Inserting the decompiled function

Now that the function has been decompiled, you'll need to add it into the decomp project and remove the corresponding raw assembly code.

  1. Create a .c in the src folder and a corresponding .h file in include.
    • Alternatively, if the function is at the beginning or end of the .s file, you may be able to add it to an existing C file. Check main.lsf to see which C (.o) file is right before/after the .s file.
  2. Add the decompiled function to the new .c file, along with its corresponding header in the .h file.
  3. Search for any externs in other files that reference the newly decompiled function. These externs can be removed and replaced with an #include to the new .h file.
  4. Remove the function's assembly code from the .s file.
  5. Split the .s file in two at the location where the function's assembly code was. The new .s file should be named according the the offset of the first function in the new file (e.g., overlay_29_022E0378.s). If the function you decompiled was at the beginning or end of the file, you can skip this step and step 6.
  6. Split the corresponding .inc file (in asm/include) to the .s file you split.
  7. Find the split file in main.lsf and add two files after it: the corresponding .o to the new .c file, and the newly split off .s file.
  8. Run make tidy and make to ensure that the project compiles and produces a matching ROM. If the ROM doesn't match, you can compare the mismatched files with the asmdiff tool or a hex editor to troubleshoot the issue.
  9. If you want to decompile more functions, repeat the decomp process by finding a new function and creating a new scratch. If you are done, make a PR to the main pmd-sky repo.
Clone this wiki locally