diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..b8d61aa --- /dev/null +++ b/404.html @@ -0,0 +1,830 @@ + + + +
+ + + + + + + + + + + + + +Info
+This page shows you how to contribute to any documentation page or wiki +based on this template.
+Note
+This theme is forked from my theme for Nexus Docs; +and this page is synced with that.
+Note
+If you are editing the repository with the theme itself on Windows, it might be a good idea to run
+git config core.symlinks true
first to allow git to create symlinks on clone.
You should learn the basics of git
, an easy way is to give GitHub Desktop (Tutorial) a go.
+It's only 15 minutes 😀.
Fork this repository:
+ +This will create a copy of the repository on your own user account, which you will be able to edit.
+Clone this repository.
+For example, using GitHub Desktop: +
+Make changes inside the docs
folder.
Consider using a Markdown Cheat Sheet if you are new to markdown.
+I recommend using a markdown editor such as Typora
.
+Personally I just work from inside Rider
.
Commit the changes and push to GitHub.
+Open a Pull Request
.
Opening a Pull Request
will allow us to review your changes before adding them with the main official page. If everything's good, we'll hit the merge button and add your changes to the official repository.
If you are working on the wiki locally, you can generate a live preview the full website.
+Here's a quick guide of how you could do it from your command prompt
(cmd).
Install Python 3
+If you have winget
installed, or Windows 11, you can do this from the command prompt.
winget install Python.Python.3
+
pacman -S python-pip # you should already have Python
+
Otherwise download Python 3 from the official website or package manager.
+Install Material for MkDocs and Plugins (Python package)
+# Restart your command prompt before running this command.
+pip install mkdocs-material
+pip install mkdocs-redirects
+
On Linux, there is a chance that python
might be a core part of your OS, meaning
+that you ideally shouldn't touch the system installation.
Use virtual environments instead.
+python -m venv mkdocs # Create the environment
+source ~/mkdocs/bin/activate # Enter the environment
+
+pip install mkdocs-material
+pip install mkdocs-redirects
+
Make sure you enter the environment before any time you run mkdocs.
+Open a command prompt in the folder containing mkdocs.yml
. and run the site locally.
+
# Move to project folder.
+cd <Replace this with full path to folder containing `mkdocs.yml`>
+mkdocs serve
+
Copy the address to your web browser and enjoy the live preview; any changes you save will be shown instantly.
+This it the NexusMods theme for Material-MkDocs, inspired by the look of Reloaded-II.
+The overall wiki theme should look fairly close to the actual launcher appearance.
+docs/Reloaded
.mkdocs.yml
in your repository root.site_name: Reloaded MkDocs Theme
+site_url: https://github.com/Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2
+
+repo_name: Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2
+repo_url: https://github.com/Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2
+
+extra:
+ social:
+ - icon: fontawesome/brands/github
+ link: https://github.com/Reloaded-Project
+ - icon: fontawesome/brands/twitter
+ link: https://twitter.com/thesewer56?lang=en-GB
+
+extra_css:
+ - Reloaded/Stylesheets/extra.css
+
+markdown_extensions:
+ - admonition
+ - tables
+ - pymdownx.details
+ - pymdownx.highlight
+ - pymdownx.superfences:
+ custom_fences:
+ - name: mermaid
+ class: mermaid
+ format: !!python/name:pymdownx.superfences.fence_code_format
+ - pymdownx.tasklist
+ - def_list
+ - meta
+ - md_in_html
+ - attr_list
+ - footnotes
+ - pymdownx.tabbed:
+ alternate_style: true
+ - pymdownx.emoji:
+ emoji_index: !!python/name:materialx.emoji.twemoji
+ emoji_generator: !!python/name:materialx.emoji.to_svg
+
+theme:
+ name: material
+ palette:
+ scheme: reloaded-slate
+ features:
+ - navigation.instant
+
+plugins:
+ - search
+
+nav:
+ - Home: index.md
+
.github/workflows/DeployMkDocs.yml
.name: DeployMkDocs
+
+# Controls when the action will run.
+on:
+ # Triggers the workflow on push on the master branch
+ push:
+ branches: [ main ]
+
+ # Allows you to run this workflow manually from the Actions tab
+ workflow_dispatch:
+
+# A workflow run is made up of one or more jobs that can run sequentially or in parallel
+jobs:
+ # This workflow contains a single job called "build"
+ build:
+ # The type of runner that the job will run on
+ runs-on: ubuntu-latest
+
+ # Steps represent a sequence of tasks that will be executed as part of the job
+ steps:
+
+ # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
+ - name: Checkout Branch
+ uses: actions/checkout@v2
+ with:
+ submodules: recursive
+
+ # Deploy MkDocs
+ - name: Deploy MkDocs
+ # You may pin to the exact commit or the version.
+ # uses: mhausenblas/mkdocs-deploy-gh-pages@66340182cb2a1a63f8a3783e3e2146b7d151a0bb
+ uses: mhausenblas/mkdocs-deploy-gh-pages@master
+ env:
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ REQUIREMENTS: ./docs/requirements.txt
+
Settings -> Pages
in your repo and select gh-pages
branch to enable GitHub pages. Your page should then be live.
+Tip
+Refer to Contributing for instructions on how to locally edit and modify the wiki.
+Note
+For Reloaded3 theme use reloaded3-slate
instead of reloaded-slate
.
Info
+Most documentation pages will also include additional plugins; some which are used in the pages here.
+Here is a sample complete mkdocs.yml you can copy to your project for reference.
If you have questions/bug reports/etc. feel free to Open an Issue.
+Happy Documenting ❤️
+ + + + + + + + +Most components of the Reloaded are governed by the GPLv3 license.
+In some, albeit rare scenarios, certain libraries might be licensed under LGPLv3 instead.
+This is a FAQ meant to clarify the licensing choice and its implications. +Please note, though, that the full license text is the final legal authority.
+The primary objective is to prevent closed-source, commercial exploitation of the project.
+We want to ensure that the project isn't used within a proprietary environment for +profit-making purposes such as:
+The Reloaded Project is a labour of love from unpaid hobbyist volunteers.
+Exploiting that work for profit feels fundamentally unfair.
+While the GPLv3 license doesn't prohibit commercial use outright, it does prevent commercial +exploitation by requiring that contributions are given back to the open-source community.
+In that fashion, everyone can benefit from the projects under the Reloaded label.
+You can as long as the resulting produce is also licensed under GPLv3, and thus open source.
+The license terms do not permit this.
+However, if your software is completely non-commercial, meaning it's neither +sold for profit, funded in development, nor hidden behind a paywall (like Patreon), +we probably just look the other way.
+This often applies to non-professional programmers, learners, or those +with no intent to exploit the project. We believe in understanding and +leniency for those who might not know better.
+GPL v3 exists to protect the project and its contributors. If you're not exploiting the project for commercial +gain, you're not hurting us; and we will not enforce the terms of the GPL.
+If you are interested in obtaining a commercial license, or want an explicit written exemption, +please get in touch with the repository owners.
+Yes, as long as you adhere to the GPLv3 license terms, you're permitted to statically +link Reloaded Libraries into your project, for instance, through the use of NativeAOT or ILMerge.
+We support and encourage the non-commercial use of Reloaded Libraries. +Non-commercial use generally refers to the usage of our libraries for personal projects, +educational purposes, academic research, or use by non-profit organizations.
+You're free to use our libraries for projects that you undertake +for your own learning, hobby or personal enjoyment. This includes creating mods for your +favorite games or building your own applications for personal use.
+Teachers and students are welcome to use our libraries as a learning +resource. You can incorporate them into your teaching materials, student projects, coding +bootcamps, workshops, etc.
+Researchers may use our libraries for academic and scholarly research. +We'd appreciate if you cite our work in any publications that result from research involving our libraries.
+If you're part of a registered non-profit organization, +you can use our libraries in your projects. However, any derivative work that uses our +libraries must also be released under the GPL.
+Please remember, if your usage of our libraries evolves from non-commercial to commercial, +you must ensure compliance with the terms of the GPL v3 license.
+As Reloaded Project is a labor of love, done purely out of passion and with an aim to contribute +to the broader community, we highly appreciate your support in providing attribution when using +our libraries.
+While not legally mandatory under GPL v3, it is a simple act that can go a long +way in recognizing the efforts of our contributors and fostering an open and collaborative atmosphere.
+If you choose to provide attribution (and we hope you do!), here are some guidelines:
+Acknowledge the Use of Reloaded Libraries: Mention that your project uses or is based on Reloaded libraries. + This could be in your project's readme, a credits page on a website, a manual, or within the software itself.
+Link to the Project: If possible, provide a link back to the Reloaded Project. + This allows others to explore and potentially benefit from our work.
+Remember, attribution is more than just giving credit,,, it's a way of saying thank you 👉👈, fostering reciprocal +respect, and acknowledging the power of collaborative open-source development.
+We appreciate your support and look forward to seeing what amazing projects you create using Reloaded libraries!
+In some rare instances, code from more permissively licensed projects, such as those under the
+MIT
or BSD
licenses, may be referenced, incorporated, or slightly modified within the Reloaded Project.
It's important to us to respect the terms and intentions of these permissive licenses, +which often allow their code to be used in a wide variety of contexts, including in GPL-licensed projects like ours.
+In these cases, the Reloaded Project is committed to clearly disclosing the usage of such code:
+Method-Level Disclosure: For individual methods or small code snippets, we use appropriate
+ attribution methods, like programming language attributes. For example, methods borrowed or adapted
+ from MIT-licensed projects might be marked with a [MITLicense]
attribute.
File-Level Disclosure: For larger amounts of code, such as entire files or modules, we'll include + the original license text at the top of the file and clearly indicate which portions of the code originate + from a differently-licensed project.
+Project-Level Disclosure: If an entire library or significant portion of a project under a more permissive + license is used, we will include an acknowledgment in a prominent location, such as the readme file or the + project's license documentation.
+This approach ensures we honor the contributions of the open source community at large, respect the original +licenses, and maintain transparency with our users about where code originates from.
+Any files/methods or snippets marked with those attributes may be consumed using their original license terms.
+i.e. If a method is marked with [MITLicense]
, you may use it under the terms of the MIT license.
We welcome and appreciate contributions to the Reloaded Project! +By contributing, you agree to share your changes under the same GPLv3 license, +helping to make the project better for everyone.
+ + + + + + +Info
+This is a dummy page with various Material MkDocs controls and features scattered throughout for testing.
+Reloaded Admonition
+An admonition featuring a Reloaded logo.
+My source is in Stylesheets/extra.css as Custom 'reloaded' admonition
.
Heart Admonition
+An admonition featuring a heart; because we want to contribute back to the open source community.
+My source is in Stylesheets/extra.css as Custom 'reloaded heart' admonition
.
Nexus Admonition
+An admonition featuring a Nexus logo.
+My source is in Stylesheets/extra.css as Custom 'nexus' admonition
.
Heart Admonition
+An admonition featuring a heart; because we want to contribute back to the open source community.
+My source is in Stylesheets/extra.css as Custom 'nexus heart' admonition
.
Flowchart (Source: Nexus Archive Library):
+flowchart TD
+ subgraph Block 2
+ BigFile1.bin
+ end
+
+ subgraph Block 1
+ BigFile0.bin
+ end
+
+ subgraph Block 0
+ ModConfig.json -.-> Updates.json
+ Updates.json -.-> more["... more .json files"]
+ end
+Sequence Diagram (Source: Reloaded3 Specification):
+sequenceDiagram
+
+ % Define Items
+ participant Mod Loader
+ participant Virtual FileSystem (VFS)
+ participant CRI CPK Archive Support
+ participant Persona 5 Royal Support
+ participant Joker Costume
+
+ % Define Actions
+ Mod Loader->>Persona 5 Royal Support: Load Mod
+ Persona 5 Royal Support->>Mod Loader: Request CRI CPK Archive Support API
+ Mod Loader->>Persona 5 Royal Support: Receive CRI CPK Archive Support Instance
+
+ Mod Loader->>Joker Costume: Load Mod
+ Mod Loader-->Persona 5 Royal Support: Notification: 'Loaded Joker Costume'
+ Persona 5 Royal Support->>CRI CPK Archive Support: Add Files from 'Joker Costume' to CPK Archive (via API)
+State Diagram (Source: Mermaid Docs):
+stateDiagram-v2
+ [*] --> Still
+ Still --> [*]
+
+ Still --> Moving
+ Moving --> Still
+ Moving --> Crash
+ Crash --> [*]
+Class Diagram (Arbitrary)
+classDiagram
+ class Animal
+ `NexusMobile™` <|-- Car
+Note
+At time of writing, version of Mermaid is a bit outdated here; and other diagrams might not render correctly +(even on unmodified theme); thus certain diagrams have been omitted from here.
+Snippet from C# version of Sewer's Virtual FileSystem (VFS):
+/// <summary>
+/// Tries to get files for a specific folder, assuming the input path is already in upper case.
+/// </summary>
+/// <param name="folderPath">The folder to find. Already lowercase.</param>
+/// <param name="value">The returned folder instance.</param>
+/// <returns>True if found, else false.</returns>
+[MethodImpl(MethodImplOptions.AggressiveInlining)]
+public bool TryGetFolderUpper(ReadOnlySpan<char> folderPath, out SpanOfCharDict<TTarget> value)
+{
+ // Must be O(1)
+ value = default!;
+
+ // Compare equality.
+ // Note to devs: Do not invert branches, we optimise for hot paths here.
+ if (folderPath.StartsWith(Prefix))
+ {
+ // Check for subfolder in branchless way.
+ // In CLR, bool is length 1, so conversion to byte should be safe.
+ // Even suppose it is not; as long as code is little endian; truncating int/4 bytes to byte still results
+ // in correct answer.
+ var hasSubfolder = Prefix.Length != folderPath.Length;
+ var hasSubfolderByte = Unsafe.As<bool, byte>(ref hasSubfolder);
+ var nextFolder = folderPath.SliceFast(Prefix.Length + hasSubfolderByte);
+
+ return SubfolderToFiles.TryGetValue(nextFolder, out value!);
+ }
+
+ return false;
+}
+
Something more number heavy, Fast Inverse Square Root from Quake III Arena (unmodified). +
float Q_rsqrt( float number )
+{
+ long i;
+ float x2, y;
+ const float threehalfs = 1.5F;
+
+ x2 = number * 0.5F;
+ y = number;
+ i = * ( long * ) &y; // evil floating point bit level hacking
+ i = 0x5f3759df - ( i >> 1 ); // what the fuck?
+ y = * ( float * ) &i;
+ y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
+// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
+
+ return y;
+}
+
Note
+Test
+Abstract
+Test
+Info
+Test
+Tip
+Test
+Success
+Test
+Question
+Test
+Warning
+Test
+Failure
+Test
+Danger
+Test
+Bug
+Test
+Example
+Test
+Quote
+Test
+Method | +Description | +
---|---|
GET |
+Fetch resource | +
PUT |
+Update resource | +
DELETE |
+Delete resource | +
Please visit the documentation site for usage instructions & more.
+ + + + + + +Info
+This page shows you how to contribute to any documentation page or wiki +based on this template.
+Note
+This theme is forked from my theme for Nexus Docs; +and this page is synced with that.
+Note
+If you are editing the repository with the theme itself on Windows, it might be a good idea to run
+git config core.symlinks true
first to allow git to create symlinks on clone.
You should learn the basics of git
, an easy way is to give GitHub Desktop (Tutorial) a go.
+It's only 15 minutes 😀.
Fork this repository:
+ +This will create a copy of the repository on your own user account, which you will be able to edit.
+Clone this repository.
+For example, using GitHub Desktop: +
+Make changes inside the docs
folder.
Consider using a Markdown Cheat Sheet if you are new to markdown.
+I recommend using a markdown editor such as Typora
.
+Personally I just work from inside Rider
.
Commit the changes and push to GitHub.
+Open a Pull Request
.
Opening a Pull Request
will allow us to review your changes before adding them with the main official page. If everything's good, we'll hit the merge button and add your changes to the official repository.
If you are working on the wiki locally, you can generate a live preview the full website.
+Here's a quick guide of how you could do it from your command prompt
(cmd).
Install Python 3
+If you have winget
installed, or Windows 11, you can do this from the command prompt.
winget install Python.Python.3
+
pacman -S python-pip # you should already have Python
+
Otherwise download Python 3 from the official website or package manager.
+Install Material for MkDocs and Plugins (Python package)
+# Restart your command prompt before running this command.
+pip install mkdocs-material
+pip install mkdocs-redirects
+
On Linux, there is a chance that python
might be a core part of your OS, meaning
+that you ideally shouldn't touch the system installation.
Use virtual environments instead.
+python -m venv mkdocs # Create the environment
+source ~/mkdocs/bin/activate # Enter the environment
+
+pip install mkdocs-material
+pip install mkdocs-redirects
+
Make sure you enter the environment before any time you run mkdocs.
+Open a command prompt in the folder containing mkdocs.yml
. and run the site locally.
+
# Move to project folder.
+cd <Replace this with full path to folder containing `mkdocs.yml`>
+mkdocs serve
+
Copy the address to your web browser and enjoy the live preview; any changes you save will be shown instantly.
+This it the NexusMods theme for Material-MkDocs, inspired by the look of Reloaded-II.
+The overall wiki theme should look fairly close to the actual launcher appearance.
+docs/Reloaded
.mkdocs.yml
in your repository root.site_name: Reloaded MkDocs Theme
+site_url: https://github.com/Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2
+
+repo_name: Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2
+repo_url: https://github.com/Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2
+
+extra:
+ social:
+ - icon: fontawesome/brands/github
+ link: https://github.com/Reloaded-Project
+ - icon: fontawesome/brands/twitter
+ link: https://twitter.com/thesewer56?lang=en-GB
+
+extra_css:
+ - Reloaded/Stylesheets/extra.css
+
+markdown_extensions:
+ - admonition
+ - tables
+ - pymdownx.details
+ - pymdownx.highlight
+ - pymdownx.superfences:
+ custom_fences:
+ - name: mermaid
+ class: mermaid
+ format: !!python/name:pymdownx.superfences.fence_code_format
+ - pymdownx.tasklist
+ - def_list
+ - meta
+ - md_in_html
+ - attr_list
+ - footnotes
+ - pymdownx.tabbed:
+ alternate_style: true
+ - pymdownx.emoji:
+ emoji_index: !!python/name:materialx.emoji.twemoji
+ emoji_generator: !!python/name:materialx.emoji.to_svg
+
+theme:
+ name: material
+ palette:
+ scheme: reloaded-slate
+ features:
+ - navigation.instant
+
+plugins:
+ - search
+
+nav:
+ - Home: index.md
+
.github/workflows/DeployMkDocs.yml
.name: DeployMkDocs
+
+# Controls when the action will run.
+on:
+ # Triggers the workflow on push on the master branch
+ push:
+ branches: [ main ]
+
+ # Allows you to run this workflow manually from the Actions tab
+ workflow_dispatch:
+
+# A workflow run is made up of one or more jobs that can run sequentially or in parallel
+jobs:
+ # This workflow contains a single job called "build"
+ build:
+ # The type of runner that the job will run on
+ runs-on: ubuntu-latest
+
+ # Steps represent a sequence of tasks that will be executed as part of the job
+ steps:
+
+ # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
+ - name: Checkout Branch
+ uses: actions/checkout@v2
+ with:
+ submodules: recursive
+
+ # Deploy MkDocs
+ - name: Deploy MkDocs
+ # You may pin to the exact commit or the version.
+ # uses: mhausenblas/mkdocs-deploy-gh-pages@66340182cb2a1a63f8a3783e3e2146b7d151a0bb
+ uses: mhausenblas/mkdocs-deploy-gh-pages@master
+ env:
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ REQUIREMENTS: ./docs/requirements.txt
+
Settings -> Pages
in your repo and select gh-pages
branch to enable GitHub pages. Your page should then be live.
+Tip
+Refer to Contributing for instructions on how to locally edit and modify the wiki.
+Note
+For Reloaded3 theme use reloaded3-slate
instead of reloaded-slate
.
Info
+Most documentation pages will also include additional plugins; some which are used in the pages here.
+Here is a sample complete mkdocs.yml you can copy to your project for reference.
If you have questions/bug reports/etc. feel free to Open an Issue.
+Happy Documenting ❤️
+ + + + + + + + +Most components of the Reloaded are governed by the GPLv3 license.
+In some, albeit rare scenarios, certain libraries might be licensed under LGPLv3 instead.
+This is a FAQ meant to clarify the licensing choice and its implications. +Please note, though, that the full license text is the final legal authority.
+The primary objective is to prevent closed-source, commercial exploitation of the project.
+We want to ensure that the project isn't used within a proprietary environment for +profit-making purposes such as:
+The Reloaded Project is a labour of love from unpaid hobbyist volunteers.
+Exploiting that work for profit feels fundamentally unfair.
+While the GPLv3 license doesn't prohibit commercial use outright, it does prevent commercial +exploitation by requiring that contributions are given back to the open-source community.
+In that fashion, everyone can benefit from the projects under the Reloaded label.
+You can as long as the resulting produce is also licensed under GPLv3, and thus open source.
+The license terms do not permit this.
+However, if your software is completely non-commercial, meaning it's neither +sold for profit, funded in development, nor hidden behind a paywall (like Patreon), +we probably just look the other way.
+This often applies to non-professional programmers, learners, or those +with no intent to exploit the project. We believe in understanding and +leniency for those who might not know better.
+GPL v3 exists to protect the project and its contributors. If you're not exploiting the project for commercial +gain, you're not hurting us; and we will not enforce the terms of the GPL.
+If you are interested in obtaining a commercial license, or want an explicit written exemption, +please get in touch with the repository owners.
+Yes, as long as you adhere to the GPLv3 license terms, you're permitted to statically +link Reloaded Libraries into your project, for instance, through the use of NativeAOT or ILMerge.
+We support and encourage the non-commercial use of Reloaded Libraries. +Non-commercial use generally refers to the usage of our libraries for personal projects, +educational purposes, academic research, or use by non-profit organizations.
+You're free to use our libraries for projects that you undertake +for your own learning, hobby or personal enjoyment. This includes creating mods for your +favorite games or building your own applications for personal use.
+Teachers and students are welcome to use our libraries as a learning +resource. You can incorporate them into your teaching materials, student projects, coding +bootcamps, workshops, etc.
+Researchers may use our libraries for academic and scholarly research. +We'd appreciate if you cite our work in any publications that result from research involving our libraries.
+If you're part of a registered non-profit organization, +you can use our libraries in your projects. However, any derivative work that uses our +libraries must also be released under the GPL.
+Please remember, if your usage of our libraries evolves from non-commercial to commercial, +you must ensure compliance with the terms of the GPL v3 license.
+As Reloaded Project is a labor of love, done purely out of passion and with an aim to contribute +to the broader community, we highly appreciate your support in providing attribution when using +our libraries.
+While not legally mandatory under GPL v3, it is a simple act that can go a long +way in recognizing the efforts of our contributors and fostering an open and collaborative atmosphere.
+If you choose to provide attribution (and we hope you do!), here are some guidelines:
+Acknowledge the Use of Reloaded Libraries: Mention that your project uses or is based on Reloaded libraries. + This could be in your project's readme, a credits page on a website, a manual, or within the software itself.
+Link to the Project: If possible, provide a link back to the Reloaded Project. + This allows others to explore and potentially benefit from our work.
+Remember, attribution is more than just giving credit,,, it's a way of saying thank you 👉👈, fostering reciprocal +respect, and acknowledging the power of collaborative open-source development.
+We appreciate your support and look forward to seeing what amazing projects you create using Reloaded libraries!
+In some rare instances, code from more permissively licensed projects, such as those under the
+MIT
or BSD
licenses, may be referenced, incorporated, or slightly modified within the Reloaded Project.
It's important to us to respect the terms and intentions of these permissive licenses, +which often allow their code to be used in a wide variety of contexts, including in GPL-licensed projects like ours.
+In these cases, the Reloaded Project is committed to clearly disclosing the usage of such code:
+Method-Level Disclosure: For individual methods or small code snippets, we use appropriate
+ attribution methods, like programming language attributes. For example, methods borrowed or adapted
+ from MIT-licensed projects might be marked with a [MITLicense]
attribute.
File-Level Disclosure: For larger amounts of code, such as entire files or modules, we'll include + the original license text at the top of the file and clearly indicate which portions of the code originate + from a differently-licensed project.
+Project-Level Disclosure: If an entire library or significant portion of a project under a more permissive + license is used, we will include an acknowledgment in a prominent location, such as the readme file or the + project's license documentation.
+This approach ensures we honor the contributions of the open source community at large, respect the original +licenses, and maintain transparency with our users about where code originates from.
+Any files/methods or snippets marked with those attributes may be consumed using their original license terms.
+i.e. If a method is marked with [MITLicense]
, you may use it under the terms of the MIT license.
We welcome and appreciate contributions to the Reloaded Project! +By contributing, you agree to share your changes under the same GPLv3 license, +helping to make the project better for everyone.
+ + + + + + +Info
+This is a dummy page with various Material MkDocs controls and features scattered throughout for testing.
+Reloaded Admonition
+An admonition featuring a Reloaded logo.
+My source is in Stylesheets/extra.css as Custom 'reloaded' admonition
.
Heart Admonition
+An admonition featuring a heart; because we want to contribute back to the open source community.
+My source is in Stylesheets/extra.css as Custom 'reloaded heart' admonition
.
Nexus Admonition
+An admonition featuring a Nexus logo.
+My source is in Stylesheets/extra.css as Custom 'nexus' admonition
.
Heart Admonition
+An admonition featuring a heart; because we want to contribute back to the open source community.
+My source is in Stylesheets/extra.css as Custom 'nexus heart' admonition
.
Flowchart (Source: Nexus Archive Library):
+flowchart TD
+ subgraph Block 2
+ BigFile1.bin
+ end
+
+ subgraph Block 1
+ BigFile0.bin
+ end
+
+ subgraph Block 0
+ ModConfig.json -.-> Updates.json
+ Updates.json -.-> more["... more .json files"]
+ end
+Sequence Diagram (Source: Reloaded3 Specification):
+sequenceDiagram
+
+ % Define Items
+ participant Mod Loader
+ participant Virtual FileSystem (VFS)
+ participant CRI CPK Archive Support
+ participant Persona 5 Royal Support
+ participant Joker Costume
+
+ % Define Actions
+ Mod Loader->>Persona 5 Royal Support: Load Mod
+ Persona 5 Royal Support->>Mod Loader: Request CRI CPK Archive Support API
+ Mod Loader->>Persona 5 Royal Support: Receive CRI CPK Archive Support Instance
+
+ Mod Loader->>Joker Costume: Load Mod
+ Mod Loader-->Persona 5 Royal Support: Notification: 'Loaded Joker Costume'
+ Persona 5 Royal Support->>CRI CPK Archive Support: Add Files from 'Joker Costume' to CPK Archive (via API)
+State Diagram (Source: Mermaid Docs):
+stateDiagram-v2
+ [*] --> Still
+ Still --> [*]
+
+ Still --> Moving
+ Moving --> Still
+ Moving --> Crash
+ Crash --> [*]
+Class Diagram (Arbitrary)
+classDiagram
+ class Animal
+ `NexusMobile™` <|-- Car
+Note
+At time of writing, version of Mermaid is a bit outdated here; and other diagrams might not render correctly +(even on unmodified theme); thus certain diagrams have been omitted from here.
+Snippet from C# version of Sewer's Virtual FileSystem (VFS):
+/// <summary>
+/// Tries to get files for a specific folder, assuming the input path is already in upper case.
+/// </summary>
+/// <param name="folderPath">The folder to find. Already lowercase.</param>
+/// <param name="value">The returned folder instance.</param>
+/// <returns>True if found, else false.</returns>
+[MethodImpl(MethodImplOptions.AggressiveInlining)]
+public bool TryGetFolderUpper(ReadOnlySpan<char> folderPath, out SpanOfCharDict<TTarget> value)
+{
+ // Must be O(1)
+ value = default!;
+
+ // Compare equality.
+ // Note to devs: Do not invert branches, we optimise for hot paths here.
+ if (folderPath.StartsWith(Prefix))
+ {
+ // Check for subfolder in branchless way.
+ // In CLR, bool is length 1, so conversion to byte should be safe.
+ // Even suppose it is not; as long as code is little endian; truncating int/4 bytes to byte still results
+ // in correct answer.
+ var hasSubfolder = Prefix.Length != folderPath.Length;
+ var hasSubfolderByte = Unsafe.As<bool, byte>(ref hasSubfolder);
+ var nextFolder = folderPath.SliceFast(Prefix.Length + hasSubfolderByte);
+
+ return SubfolderToFiles.TryGetValue(nextFolder, out value!);
+ }
+
+ return false;
+}
+
Something more number heavy, Fast Inverse Square Root from Quake III Arena (unmodified). +
float Q_rsqrt( float number )
+{
+ long i;
+ float x2, y;
+ const float threehalfs = 1.5F;
+
+ x2 = number * 0.5F;
+ y = number;
+ i = * ( long * ) &y; // evil floating point bit level hacking
+ i = 0x5f3759df - ( i >> 1 ); // what the fuck?
+ y = * ( float * ) &i;
+ y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
+// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
+
+ return y;
+}
+
Note
+Test
+Abstract
+Test
+Info
+Test
+Tip
+Test
+Success
+Test
+Question
+Test
+Warning
+Test
+Failure
+Test
+Danger
+Test
+Bug
+Test
+Example
+Test
+Quote
+Test
+Method | +Description | +
---|---|
GET |
+Fetch resource | +
PUT |
+Update resource | +
DELETE |
+Delete resource | +
The wiki provides details on internals of the library. They may help you when contributing 😉.
+First off, thank you for considering contributing to reloaded-hooks.
+If your contribution is not straightforward, please first discuss the change you +wish to make by creating a new issue before making the change. We might be able to discuss +general design, etc. before you embark on a huge endeavour.
+Before reporting an issue on the +issue tracker, +please check that it has not already been reported by searching for some related +keywords.
+Try to do one pull request per change.
+Reloaded repositories auto-generate changelogs based on commit names.
+When you make git commits; try to stick to the style of Keep a changelog:
+Added
for new features. Changed
for changes in existing functionality. Deprecated
for soon-to-be removed features. Removed
for now removed features. Fixed
for any bug fixes. Security
in case of vulnerabilities. Please use the standard code style cargo fmt
, and run the clippy
linter
+(cargo clippy
), fixing warnings before submitting PRs.
If you are using VSCode, this should be automated (on Save) per this repository's settings.
+ + + + + + +This is just a quick reference sheet for developers.
+ARM64 is not currently implemented.
+Register | +ARM64 (System V) | +Volatile/Non-Volatile | +
---|---|---|
x0 -x7 |
+Parameter/Result Registers | +Volatile | +
x8 |
+Indirect result location register | +Volatile | +
x9 -x15 |
+Local Variables | +Volatile | +
x16 -x17 |
+Intra-procedure-call scratch registers | +Volatile | +
x18 |
+Platform register, conventionally the TLS base | +Volatile | +
x19 -x28 |
+Registers saved across function calls | +Non-Volatile | +
x29 |
+Frame pointer | +Non-Volatile | +
x30 |
+Link register | +Volatile | +
sp |
+Stack pointer | +Non-Volatile | +
xzr |
+Zero register, always reads as zero | +N/A | +
x31 |
+Stack pointer or zero register, contextually reads as either sp or xzr |
+N/A | +
For floating point / SIMD registers:
+Register | +ARM64 (System V) | +Volatile/Non-Volatile | +
---|---|---|
v0 -v7 |
+Parameter/Result registers | +Volatile | +
v8 -v15 |
+Temporary registers | +Volatile | +
v16 -v31 |
+Registers saved across function calls | +Non-Volatile | +
It is recommended library users manually specify conventions in their hook functions."
+When the calling convention of <your function>
is not specified, wrapper libraries must insert
+the appropriate default convention in their wrappers.
aarch64-unknown-linux-gnu
: SystemVaarch64-pc-windows-msvc
: Windows ARM64Linux ARM64
: SystemVWindows ARM64
: Windows ARM64This page provides a listing of all instructions rewritten as part of the Code Relocation process.
+Purpose:
+The ADR
instruction in ARM architectures computes the address of a label and writes it to the destination register.
Behaviour:
+The ADR(P) instruction is rewritten as one of the following:
+- ADR(P)
+- ADR(P) + ADD
+- MOV (1-4 instructions)
Example:
+From ADRP to ADR: +
// Before: ADRP x0, 0x101000
+// After: ADR x0, 0xFFFFF
+// Parameters: (old_instruction, old_address, new_address)
+rewrite_adr(0x000800B0_u32.to_be(), 0, 4097);
+
Within 4GiB Range with Offset: +
// Before: ADRP x0, 0x101000
+// After:
+// - ADRP x0, 0x102000
+// - ADD x0, x0, 1
+rewrite_adr(0x000800B0_u32.to_be(), 4097, 0);
+
Within 4GiB Range without Offset: +
// Before: ADRP x0, 0x101000
+// After: ADRP x0, 0x102000
+rewrite_adr(0x000800B0_u32.to_be(), 4096, 0);
+
Out of Range: +
// PC = 0x100000000
+
+// Before: ADRP, x0, 0x101000
+// After: MOV IMMEDIATE 0x100101000
+rewrite_adr(0x000800B0_u32.to_be(), 0x100000000, 0);
+
Purpose:
+The Bcc
instruction in ARM architectures performs a conditional branch based on specific condition flags.
Behaviour:
+The Branch Conditional instruction is rewritten as:
+- BCC
+- BCC
+- BCC
+- BCC
<skip>
means, invert the condition, and jump over the code inside [] brackets.
Example:
+Within 1MiB: +
// Before: b.eq #4
+// After: b.eq #-4092
+// Parameters: (old_instruction, old_address, new_address, scratch_register)
+rewrite_bcc(0x20000054_u32.to_be(), 0, 4096, Some(17));
+
Within 128MiB: +
// Before: b.eq #0
+// After:
+// - b.ne #8
+// - b #-0x80000000
+rewrite_bcc(0x00000054_u32.to_be(), 0, 0x8000000 - 4, Some(17));
+
Within 4GiB Range with Address Adjustment: +
// Before: b.eq #512
+// After:
+// - b.ne #16
+// - adrp x17, #0x8000000
+// - add x17, #512
+// - br x17
+rewrite_bcc(0x00100054_u32.to_be(), 0x8000000, 0, Some(17));
+
Within 4GiB Range without Offset: +
// Before: b.eq #512
+// After:
+// - b.ne #12
+// - adrp x17, #-0x8000000
+// - br x17
+rewrite_bcc(0x00100054_u32.to_be(), 0, 0x8000000, Some(17));
+
Last Resort: +
// Before: b.eq #0
+// After:
+// - b.ne #12
+// - movz x17, #0
+// - br x17
+rewrite_bcc(0x00000054_u32.to_be(), 0, 0x100000000, Some(17));
+
Including Branch+Link (BL).
+Purpose:
+The B
(or BL
for Branch+Link) instruction in ARM architectures performs a direct branch (or branch with link) to a specified address. When using the BL
variant, the return address (the address of the instruction following the branch) is stored in the link register LR
.
Behaviour:
+The Branch instruction is rewritten as one of the following:
+- B (or BL)
+- ADRP + BR
+- ADRP + ADD + BR
+- MOV
Example:
+Direct Branch within Range: +
// Before: b #4096
+// After: b #8192
+// Parameters: (old_instruction, old_address, new_address, scratch_register, link)
+rewrite_b(0x00040014_u32.to_be(), 8192, 4096, Some(17), false);
+
Within 4GiB with Address Adjustment: +
// Before: b #4096
+// After:
+// - adrp x17, #0x8000000
+// - br x17
+rewrite_b(0x00040014_u32.to_be(), 0x8000000, 0, Some(17), false);
+
Within 4GiB Range with Offset: +
// Before: b #4096
+// After:
+// - adrp x17, #0x8000512
+// - add x17, x17, #512
+// - br x17
+rewrite_b(0x00040014_u32.to_be(), 0x8000512, 0, Some(17), false);
+
Out of Range, Use MOV: +
// Before: b #4096
+// After:
+// - movz x17, #...
+// - ...
+// - br x17
+rewrite_b(0x00040014_u32.to_be(), 0x100000000, 0, Some(17), false);
+
Branch with Link within Range: +
// Before: bl #4096
+// After: bl #8192
+rewrite_b(0x00040094_u32.to_be(), 8192, 4096, Some(17), true);
+
Purpose:
+The CBZ
instruction in ARM architectures performs a conditional branch when the specified register is zero. If the register is not zero and the condition is not met, the next sequential instruction is executed.
Behaviour:
+The CBZ
instruction is rewritten as one of the following:
+- CBZ
+- CBZ
+- CBZ
+- CBZ
+- CBZ
Here, <skip>
is used to invert the condition and jump over the set of instructions inside the []
brackets if the condition is not met.
Example:
+Within 1MiB Range: +
// Before: cbz x0, #4096
+// After: cbz x0, #8192
+// Parameters: (old_instruction, old_address, new_address)
+rewrite_cbz(0x008000B4_u32.to_be(), 8192, 4096, Some(17));
+
Within 128MiB Range: +
// Before: cbz x0, #4096
+// After:
+// - cbnz x0, #8
+// - b #0x8000000
+rewrite_cbz(0x008000B4_u32.to_be(), 0x8000000, 4096, Some(17));
+
Within 4GiB + 4096 aligned: +
// Before: cbz x0, #4096
+// After:
+// - cbnz x0, <skip 3 instructions>
+// - adrp x17, #0x8000000
+// - br x17
+rewrite_cbz(0x008000B4_u32.to_be(), 0x8000000, 0, Some(17));
+
Within 4GiB with Offset: +
// Before: cbz x0, #4096
+// After:
+// - cbnz x0, <skip 4 instructions>
+// - adrp x17, #0x8000000
+// - add x17, #512
+// - br x17
+rewrite_cbz(0x008000B4_u32.to_be(), 0x8000512, 0, Some(17));
+
Out of Range (Move and Branch): +
// Before: cbz x0, #4096
+// After:
+// - cbnz x0, <skip X instructions>
+// - mov x17, <immediate address>
+// - br x17
+rewrite_cbz(0x008000B4_u32.to_be(), 0x100000000, 0, Some(17));
+
This includes Prefetch PRFM
which shares opcode with LDR.
Purpose:
+The LDR
instruction in ARM architectures is used to load a value from memory into a register. It can use various addressing modes, but commonly it involves an offset from a base register or the program counter.
Behaviour:
+The LDR
instruction is rewritten as one of the following, depending on the relocation range:
The choice of rewriting strategy is based on the distance between the old address and the new one, with a preference for the most direct form that satisfies the required address range.
+If the instruction is Prefetch PRFM
, it is discarded if it can't be re-encoded as PRFM (literal)
, as prefetching with multiple instructions is probably less efficient than not prefetching at all.
Example:
+Within 1MiB Range: +
// Before: LDR x0, #0
+// After: LDR x0, #4096
+// Parameters: (opcode, new_imm12, rn)
+rewrite_ldr_literal(0x00000058_u32.to_be(), 4096, 0);
+
Within 4GiB + 4096 aligned: +
// Before: LDR x0, #0
+// After:
+// - adrp x0, #0x100000
+// - ldr x0, [x0]
+// Parameters: (opcode, new_address, old_address)
+rewrite_ldr_literal(0x00000058_u32.to_be(), 0x100000, 0);
+
Within 4GiB: +
// Before: LDR x0, #512
+// After:
+// - adrp x0, #0x100000
+// - ldr x0, [x0, #512]
+// Parameters: (opcode, new_address, old_address)
+rewrite_ldr_literal(0x00100058_u32.to_be(), 0x100000, 0);
+
Out of Range (Last Resort): +
// Before: LDR x0, #512
+// After:
+// - movz x0, #0, lsl #16
+// - movk x0, #0x1, lsl #32
+// - ldr x0, [x0, #512]
+// Parameters: (opcode, new_address, old_address)
+rewrite_ldr_literal(0x00100058_u32.to_be(), 0x100000000, 0);
+
Purpose:
+The TBZ
instruction in ARM architectures tests a specified bit in a register and performs a conditional branch if the bit is zero. If the tested bit is not zero, the next sequential instruction is executed.
Behaviour:
+The TBZ
instruction is rewritten based on the distance to the new branch target. It is transformed into one of the following patterns:
+- TBZ
+- TBZ
+- TBZ
+- TBZ
+- TBZ
Here, <skip>
is used to indicate a conditional skip over a set of instructions if the tested bit is not zero. The specific transformation depends on the offset between the current position and the new branch target.
Safety:
+It is crucial to ensure that the provided instruction
parameter is a valid TBZ
opcode. Incorrect opcodes or assumptions that a different type of instruction is a TBZ
may lead to undefined behaviour.
Functionality:
+The rewrite_tbz
function alters the TBZ
instruction to accommodate a new target address that is outside of its original range. The target address could be within the same 32KiB range or farther, necessitating different rewriting strategies.
Example:
+Within 32KiB Range: +
// Original: tbz x0, #0, #4096
+// Rewritten: tbz x0, #0, #8192
+// Parameters: (old_instruction, old_address, new_address, scratch_reg)
+rewrite_tbz(0x00800036_u32.to_be(), 8192, 4096, Some(17));
+
Within 128MiB Range: +
// Original: tbz x0, #0, #4096
+// Rewritten:
+// - tbnz x0, #0, #8
+// - b #0x8000000
+rewrite_tbz(0x00800036_u32.to_be(), 0x8000000, 4096, Some(17));
+
Within 4GiB Range Aligned to 4096: +
// Original: tbz x0, #0, #4096
+// Rewritten:
+// - tbnz w0, #0, #0xc
+// - adrp x17, #0x8001000
+// - br x17
+rewrite_tbz(0x00800036_u32.to_be(), 0x8000000, 0, Some(17));
+
Within 4GiB Range with Offset: +
// Original: tbz x0, #0, #4096
+// Rewritten:
+// - tbnz w0, #0, #0x10
+// - adrp x17, #0x8001000
+// - add x17, x17, #0x512
+// - br x17
+rewrite_tbz(0x00800036_u32.to_be(), 0x8000512, 0, Some(17));
+
Out of 4GiB Range (Move and Branch): +
// Original: tbz x0, #0, #4096
+// Rewritten:
+// - tbnz w0, #0, #0x14
+// - movz x17, #0x1000
+// - movk x17, #0, lsl #16
+// - movk x17, #0x1, lsl #32
+// - br x17
+rewrite_tbz(0x00800036_u32.to_be(), 0x100000000, 0, Some(17));
+
This page tells you which Operations are currently implemented for each architecture.
+Architecture | +Supported | +Notes | +
---|---|---|
x64 | +✅ | ++-2GiB | +
x86 | +✅ | ++-2GiB | +
ARM64 (+- 128MiB) | +✅ | ++-128MiB | +
ARM64 (+- 4GiB) | +✅ | +Uses 3 instructions. Used if within range. | +
Architecture | +Supported | +Notes | +
---|---|---|
x64 | +✅ | +Uses scratch register for efficiency. | +
x86 | +✅ | +Uses scratch register for efficiency. | +
ARM64 | +✅ | +Uses scratch register (required) | +
Architecture | +Supported | +Notes | +
---|---|---|
x86 | +✅ | ++ |
x86 | +✅ | ++ |
ARM64 | +❌ | +Variant 0. | +
ARM64 | +✅ | +Variant 1. Replaced with JumpAbsolute, for perf reasons. | +
Architecture | +Register to Register | +Vector to Vector | +
---|---|---|
x64 | +✅ | +✅ | +
x86 | +✅ | +✅ | +
ARM64 | +✅ | +✅ | +
Architecture | +to Register | +to Vector | +
---|---|---|
x64 | +✅ | +✅ | +
x86 | +✅ | +✅ | +
ARM64 | +✅ | +✅ | +
Architecture | +to Register | +to Vector | +
---|---|---|
x64 | +✅ | +✅ | +
x86 | +✅ | +✅ | +
ARM64* | +❌ | +❌ | +
This is not needed for optimal code generation on ARM64, thus was not implemented.
+Architecture | +Register | +Vector | +
---|---|---|
x64 | +✅ | +✅ | +
x86 | +✅ | +✅ | +
ARM64 | +✅ | +✅ | +
Architecture | +Supported | +Notes | +
---|---|---|
x64 | +✅ | ++ |
x86 | +✅ | ++ |
ARM64 | +✅ | +Will use vector registers when available. | +
Architecture | +Supported | +Notes | +
---|---|---|
x64 | +✅ | ++ |
x86 | +✅ | ++ |
ARM64 | +✅ | +2-5 instructions, depending on constant length. | +
Architecture | +Supported | +
---|---|
x64 | +✅ | +
x86 | +✅ | +
ARM64 | +✅ | +
Architecture | +to Register | +to Vector | +Notes | +
---|---|---|---|
x64 | +✅ | +✅ | ++ |
x86 | +✅ | +✅ | ++ |
ARM64 | +✅ | +✅ | ++ |
Architecture | +Registers | +Vectors | +Notes | +
---|---|---|---|
x64 | +✅ | +✅ * | +*Requires scratch register | +
x86 | +✅ | +✅ * | +*Requires scratch register | +
ARM64 | +✅ * | +✅ * | +*Requires scratch register | +
Architecture | +Supported | +Notes | +
---|---|---|
x64 (register) | +✅ | +Uses scratch register for efficiency. | +
x86 (register) | +✅ | +Uses scratch register for efficiency. | +
ARM64 (register) | +✅ | +Uses scratch register (required) | +
Architecture | +Supported | +Notes | +
---|---|---|
x64 | +✅ | ++-2GiB | +
x86 | +✅ | ++-2GiB | +
ARM64 | +✅ | ++-128MiB | +
Architecture | +Supported | +Notes | +
---|---|---|
x64 | +✅ | ++ |
x86 | +✅ | ++ |
ARM64 | +✅ | +2 instructions if offset > 0. | +
Architecture | +Supported | +Notes | +
---|---|---|
x64 | +✅ | ++ |
x86 | +❓ | +Unsupported. | +
ARM64 (+- 1MiB) | +✅ | +2 instructions. | +
ARM64 (+- 4GiB) | +✅ | +3 instructions. | +
Architecture | +Supported | +Notes | +
---|---|---|
x64 | +✅ | ++ |
x86 | +❓ | +Unsupported. | +
ARM64 (+- 1MiB) | +✅ | +2 instructions. | +
ARM64 (+- 4GiB) | +✅ | +3 instructions. | +
Architecture | +Supported | +Notes | +
---|---|---|
x64* | +✅ | ++ |
x86* | +✅ | ++ |
ARM64 | +✅ | +Might fall back to single pop/push if mixing register sizes. | +
* Implemented but not used, due to more efficient code generation alternative.
+Architecture | +Supported | +Notes | +
---|---|---|
x64* | +✅ | ++ |
x86* | +✅ | ++ |
ARM64 | +✅ | +Might fall back to single pop/push if mixing register sizes. | +
* Implemented but not used, due to more efficient code generation alternative.
+ + + + + + +This page provides a reference for all of the various 'operations' implemented by individual JIT(s).
+For more information about each of the operations, see the source code 😉 (enum Operation<T>
).
Represents jumping to a relative offset from current instruction pointer.
+let jump_rel = JumpRelativeOperation {
+ target_address: 0x200,
+};
+
jmp 0x200 ; Jump to address at current IP + 0x200
+
b 0x200 ; Branch to address at current IP + 0x200
+
adrp x9, #0 ; Load 4K page, relative to PC. (round address down to 4096)
+add x9, x9, #100 ; Add any missing offset.
+blr x9 ; Branch to location
+
jmp 0x200 ; Jump to address at current IP + 0x200
+
Represents jumping to an absolute address stored in a register.
+JIT is free to encode this as a relative branch if it's possible.
+let jump_abs = JumpAbsoluteOperation {
+ scratch_register: rax,
+ target_address: 0x123456,
+};
+
mov rax, 0x123456 ; Move target address into rax
+jmp rax ; Jump to address in rax
+
MOVZ x9, #0x3456 ; Set lower bits.
+MOVK x9, #0x12, LSL #16 ; Move upper bits
+br x9 ; Branch to location
+
mov eax, 0x123456 ; Move target address into eax
+jmp eax ; Jump to address in eax
+
We prefer this approach to absolute jump
because it is faster performance wise.
Represents jumping to an absolute address stored in a memory address.
+let jump_ind = JumpIndirectOperation {
+ target_address: 0x123456,
+};
+
jmp qword [0x123456] ; Jump to address stored at 0x123456
+
jmp dword [0x123456] ; Jump to address stored at 0x123456
+
; Possible on Multiple of 0x10000 with offset 0-4096
+MOVZ x9, #0x123, LSL #16 ; Store upper 16 bits.
+LDR x9, [x9, #0x456] ; Load lower 12 bit offset
+br x9 ; Branch to location
+
; On any address up to 4GiB + 4096
+MOVZ x9, #0x3456 ; Set lower bits.
+MOVK x9, #0x12, LSL #16 ; Move upper bits
+ ; Continue until desired address.
+LDR x9, [x9, #0x0] ; Load from address.
+br x9
+
On MacOS, this is not usable, because memory < 2GiB is restricted from access.
+This includes functionality like 'parameter injection'.
+Represents a move operation between two registers.
+let move_op = MovOperation {
+ source: r8,
+ target: r9,
+};
+
mov r9, r8 ; Move r8 into r9
+
mov x9, x8 ; Move x8 into x9
+
mov ebx, eax ; Move eax into ebx
+
Represents a move operation from the stack into a register.
+let move_from_stack = MovFromStackOperation {
+ stack_offset: 8,
+ target: rbx,
+};
+
mov rbx, [rsp + 8] ; Move value at rsp + 8 into rbx
+
ldr x9, [sp, #8] ; Load value at sp + 8 into x9
+
mov ebx, [esp + 8] ; Move value at esp + 8 into ebx
+
Represents moving a register value onto the stack at a user specified offset.
+let mov_to_stack = MovToStackOperation {
+ register: rbx,
+ stack_offset: 16,
+};
+
mov [rsp + 16], rbx ; Move rbx onto the stack 16 bytes above rsp
+
str x9, [sp, #16] ; Store x9 onto the stack 16 bytes above sp
+
mov [esp + 16], ebx ; Move ebx onto the stack 16 bytes above esp
+
Represents pushing a register onto the stack.
+let push = PushOperation {
+ register: r9,
+};
+
push r9 ; Push rbx onto the stack
+
sub sp, sp, #8 ; Decrement stack pointer
+str x9, [sp] ; Store x9 on the stack
+
push ebx ; Push ebx onto the stack
+
Represents pushing a value from the stack to the stack.
+let push_stack = PushStackOperation {
+ offset: 8,
+ item_size: 8,
+};
+
push qword [rsp + 8] ; Push value at rsp + 8 onto the stack
+
ldr x9, [sp, #8] ; Load value at sp + 8 into x9
+sub sp, sp, #8 ; Decrement stack pointer
+str x9, [sp] ; Push x9 onto the stack
+
push [esp + 8] ; Push value at esp + 8 onto the stack
+
Represents pushing a constant value onto the stack.
+let push_const = PushConstantOperation {
+ value: 10,
+};
+
push 10 ; Push constant value 10 onto stack
+
sub sp, sp, #8 ; Decrement stack pointer
+mov x9, 10 ; Move constant 10 into x9
+str x9, [sp] ; Store x9 on the stack
+
push 10 ; Push constant value 10 onto stack
+
Represents adjusting the stack pointer.
+let stack_alloc = StackAllocOperation {
+ operand: 8,
+};
+
sub rsp, 8 ; Decrement rsp by 8
+
sub sp, sp, #8 ; Decrement sp by 8
+
sub esp, 8 ; Decrement esp by 8
+
Represents popping a value from the stack into a register.
+let pop = PopOperation {
+ register: rbx,
+};
+
pop rbx ; Pop value from stack into rbx
+
ldr x9, [sp] ; Load stack top into x9
+add sp, sp, #8 ; Increment stack pointer
+
pop ebx ; Pop value from stack into ebx
+
Represents exchanging the contents of two registers.
+On some architectures (e.g. ARM64) this requires a scratch register.
+let xchg = XChgOperation {
+ register1: r9,
+ register2: r8,
+ scratch: None,
+};
+
xchg r8, r9 ; Swap r8 and r9
+
// ARM doesn't have xchg instruction
+mov x10, x8 ; Move x8 into x10 (scratch register)
+mov x8, x9 ; Move x9 into x8
+mov x9, x10 ; Move original x8 (in x10) into x9
+
xchg eax, ebx ; Swap eax and ebx
+
Represents calling an absolute address stored in a register or memory.
+let call_abs = CallAbsoluteOperation {
+ scratch_register: r9,
+ target_address: 0x123456,
+};
+
mov rax, 0x123456 ; Move target address into rax
+call r9 ; Call address in rax
+
adr x9, target_func ; Load address of target function into x9
+blr x9 ; Branch and link to address in x9
+
mov eax, 0x123456 ; Move target address into eax
+call eax ; Call address in eax
+
Represents calling a relative offset from current instruction pointer.
+let call_rel = CallRelativeOperation {
+ target_address: 0x200,
+};
+
call 0x200 ; Call address at current IP + 0x200
+
bl 0x200 ; Branch with link to address at current IP + 0x200
+
call 0x200 ; Call address at current IP + 0x200
+
Represents returning from a function call.
+let ret = ReturnOperation {
+ offset: 4,
+};
+
ret ; Return
+ret 4 ; Return and add 4 to stack pointer
+
ret ; Return
+add sp, sp, #4 ; Add 4 to stack pointer
+ret ; Return
+
ret ; Return
+ret 4 ; Return and add 4 to stack pointer
+
These operations are only available on certain architectures.
+These are non essential, but can improve compatibility/performance.
+Enabled by setting JitCapabilities::CanEncodeIPRelativeCall
and JitCapabilities::CanEncodeIPRelativeJump
in JIT.
Represents calling an IP-relative offset where target address is stored.
+let call_rip_rel = CallIpRelativeOperation {
+ target_address: 0x1000,
+};
+
call qword [rip - 16] ; Address 0x1000 is at RIP-16 and contains raw address to call
+
ldr x9, 4 ; Read item in a multiple of 4 bytes relative to PC
+blr x9 ; Branch call to location
+
adrp x9, #0x0 ; Load 4K page, relative to PC. (round address down to 4096)
+ldr x9, [x9, 1110] ; Read address from offset in 4K page.
+blr x9 ; Branch to location
+
Represents jumping to an IP-relative offset where target address is stored.
+let jump_rip_rel = JumpIpRelativeOperation {
+ target_address: 0x1000,
+};
+
jmp qword [rip - 16] ; Address 0x1000 is at RIP-16 and contains raw address to jump
+
ldr x9, 4 ; Read item in a multiple of 4 bytes relative to PC
+br x9 ; Branch call to location
+
adrp x9, #0x0 ; Load 4K page, relative to PC. (round address down to 4096)
+ldr x9, [x9, 1110] ; Read address from offset in 4K page.
+br x9 ; Branch call to location
+
Enabled by setting JitCapabilities::CanMultiPush
in JIT.
Represents pushing multiple registers onto the stack.
+Implementations must support push/pop of mixed registers (e.g. Reg+Vector).
+let multi_push = MultiPushOperation {
+ registers: [
+ PushOperation { register: rbx },
+ PushOperation { register: rax },
+ PushOperation { register: rcx },
+ PushOperation { register: rdx },
+ ],
+};
+
push rbx
+push rax
+push rcx
+push rdx ; Push rbx, rax, rcx, rdx onto the stack
+
sub sp, sp, #32 ; Decrement stack pointer by 32 bytes
+stp x9, x8, [sp] ; Store x9 and x8 on the stack
+stp x11, x10, [sp, #16] ; Store x11 and x10 on the stack
+
push ebx
+push eax
+push ecx
+push edx ; Push ebx, eax, ecx, edx onto the stack
+
Represents popping multiple registers from the stack.
+Implementations must support push/pop of mixed registers (e.g. Reg+Vector).
+let multi_pop = MultiPopOperation {
+ registers: [
+ PopOperation { register: rdx },
+ PopOperation { register: rcx },
+ PopOperation { register: rax },
+ PopOperation { register: rbx },
+ ],
+};
+
pop rdx
+pop rcx
+pop rax
+pop rbx ; Pop rdx, rcx, rax, rbx from the stack
+
ldp x11, x10, [sp], #16 ; Load x11 and x10 from stack and update stack pointer
+ldp x9, x8, [sp], #16 ; Load x9 and x8 from stack and update stack pointer
+
pop edx
+pop ecx
+pop eax
+pop ebx ; Pop edx, ecx, eax, ebx from the stack
+
Lists currently supported architectures and their features.
+Lists the currently available library features for different architectures.
+Feature | +x86 & x64 | +ARM64 | +
---|---|---|
Basic Function Hooking | +✅ | +✅ | +
Code Relocation | +✅* | +✅ | +
Hook Stacking | +✅ | +✅ | +
Calling Convention Wrapper Generation | +✅ | +✅ | +
Optimal Wrapper Generation | +✅ | +✅ | +
Length Disassembler | +✅ | +✅ | +
The ability to hook/detour existing application functions.
+Implement a code writer by inheriting the Jit<TRegister>
trait
In the writer, implement at least the following operations:
+Your Platform must also support Permission Change, if it is +applicable to your platform.
+Length disassembly is the ability to determine instruction lengths at a given address.
+A length disassembler determines the minimum amount of instructions (in bytes) needed to copy when hooking +a function.
+/// Disassembles items at `code_address` until the length of instructions
+/// is equal to or greater than `min_length`.
+///
+/// # Returns
+/// Returns length of instructions (in bytes) greater than or equal to min_length
+fn disassemble_length(code_address: usize, min_length: usize) -> usize
+
This is done by disassembling the original instructions at code_address
, incrementing a length for each
+encountered instruction until length >= min_length
, then returning the result.
For hooking functions, it's necessary to inject a jmp
instruction into the existing code.
For example, given this sequence:
+; x86 Assembly
+DoMathWithTwoNumbers:
+ cmp rcx, 0 ; 48 83 F9 00
+ jg skipAdd ; 7F 0E
+
+ mov rax, [rsp + 8] ; 48 8B 44 24 04
+ mov rax, [rsp + 16] ; 48 8B 4C 24 04
+ add rax, rcx ; 48 01 C8
+ ret ; C3
+
A `5 byte`` relative jump would overwrite the first two instructions, creating:
+; x86 Assembly
+DoMathWithTwoNumbers:
+ jmp stub ; E9 XX XX XX XX
+ <INVALID INSTRUCTION> ; 0E
+
+ mov rax, [rsp + 8] ; 48 8B 44 24 04
+ mov rax, [rsp + 16] ; 48 8B 4C 24 04
+ add rax, rcx ; 48 01 C8
+ ret ; C3
+
When calling the original function again, and thus creating the Reverse Wrapper,
+the original instructions overwritten by the jmp
will need to be executed.
To do this, we must know that the original 2 instructions at DoMathWithTwoNumbers
were 6, NOT
+5 byte
s in length total. Such that when we copy the original code to Reverse Wrapper
+we get
cmp rcx, 0 ; 48 83 F9 00
+jg skipAdd ; 7F 0E
+
and not
+cmp rcx, 0 ; 48 83 F9 00
+<INVALID INSTRUCTION> ; 7F
+
With a length disassembler, we are able to safely copy all the bytes needed.
+Implement a length disassembler by inheriting the LengthDisassembler
trait.
Use the algorithm described in example.
+Code relocation is the ability to rewrite existing code such that existing instructions using PC/IP relative operands still have valid operands post patching.
+Suppose the following x86 code, which was optimised away to accept first parameter in ecx
register:
int DoMathWithTwoNumbers(int operation@ecx, int a, int b) {
+
+ if (operation <= 0) {
+ return a + b;
+ }
+
+ // Omitted Code Here
+}
+
In this case it's possible that there's a jump in the very beginning of the function:
+DoMathWithTwoNumbers:
+ cmp ecx, 0
+ jg skipAdd # It's greater than 0
+
+ mov eax, [esp + {wordSize * 1}] ; Left Parameter
+ mov ecx, [esp + {wordSize * 2}] ; Right Parameter
+ add eax, ecx
+ ret
+
+ ; Some Omitted Code Here
+
+skipAdd:
+ ; Omitted Code Here
+
In a scenario like this, the hooking library would overwrite the cmp
and jg
instruction when
+it assembles the hook entry ('enter hook'); and when the original
+function is called again by your hook the, 'wrapper' would now contain this jg
instruction.
Because jg
is an instruction relative to the current instruction address, the library must be able
+to patch and 'relocate' the function to a new address.
Basic code relocation support is needed to stack hooks.
+Implement a relocator by CodeRewriter
trait.
There is no 'general strategy' for this, however, here are some pieces of advice:
+branch
etc.) The ability to convert between different calling conventions (e.g. cdecl -> stdcall
).
To implement this, you implement a code writer by inheriting the Jit<TRegister>
trait; and
+implement the following operations:
If this is checked, it means the wrappers generate optimal code (to best of knowledge).
+While the wrapper generator does most optimisations themselves, in some cases, it may be possible to perform additional optimisations in the JIT/Code Writer side.
+For example, the reloaded-hooks
wrapper generator might generate the following sequence of pushes for ARM64:
push x0
+push x1
+
A clever ARM64 compiler however would be able to translate this to:
+stp x0, x1, [sp, #-16]!
+
For some built in optimisations, like this, you can opt into these specialised instructions with JitCapabilities
on your Jit<TRegister>
.
Some others, may be implemented at Jit level instead.
+Hook stacking is the ability to hook a function multiple times.
+This should work flawlessly out of the box if all of the required elements are implemented.
+ + + + + + +This page provides a listing of all instructions rewritten as part of the Code Relocation process for x86 architecture.
+This page provides a comprehensive overview of the instruction rewriting techniques used in the code +relocation process, specifically tailored for the x64 architecture.
+If the new relative branch target is within the encodable range, it is left as relative.
+Original: (EB 02
)
+- jmp +2
Relocated: (E9 FF 0F 00 00
)
+- jmp +4098
// Parameters for test case:
+// - Original Code (Hex)
+// - Original Address
+// - New Address
+// - New Expected Code (Hex)
+`#[case::simple_branch("eb02", 4096, 0, "e9ff0f0000")]
+
In x86, any address is reachable from any address
+This is due to integer over/underflow and immediates being 2GiB in size. Therefore relocation
+simply involves extending the immediate as needed, i.e. jmp 0x12
to jmp 0x123012
etc.
The rest of the page will therefore leave out relative cases, and only focus on offsets greater +than 2GiB.
+The x64 rewriter is only suitable for rewriting function prologues.
+To be able to perform a lot of actions in a position independent manner, this rewriter uses a dummy +'scratch' register which it will overwrite.
+Scratch register is determined by the following logic:
+Caller Saved Registers
(these restored after function call). Because rewriting a lot of code will lead to register exhaustion, it must be reiterated the rewriter can only be used for small bits of code.
+x64 has over 5000 ‼️ instructions that require rewriting. Only a couple hundred are tested currently
+Instructions such as JMP
, CALL
, etc.
Behaviour:
+If out of range, it is rewritten using a combination of MOV
(move the absolute address into a register) followed by JMP
or CALL
to that register.
Original: (EB 02
)
+- jmp +2
Relocated: (48 B8 04 00 00 80 00 00 00 00 FF E0
)
+- mov rax, 0x80000004
+- jmp rax
// Parameters for test case:
+// - Original Code (Hex)
+// - Original Address
+// - New Address
+// - New Expected Code (Hex)
+#[case::to_abs_jmp_i8("eb02", 0x80000000, 0, "48b80400008000000000ffe0")]
+
Instructions such as jne
, jg
etc.
Behaviour:
+MOV
to set the address and a JMP
to that address.Example:
+Original: (70 02
)
+- jo +2
Relocated: (71 0C 48 B8 04 00 00 80 00 00 00 FF E0
):
+- jno +12 <skip>
+- mov rax, 0x80000004
+- jmp rax
// Parameters for test case:
+// - Original Code (Hex)
+// - Original Address
+// - New Address
+// - New Expected Code (Hex)
+#[case::jo("7002", 0x80000000, 0, "710c48b80400008000000000ffe0")]
+
Instructions such as LOOP
, LOOPE
, and LOOPNE
.
Behaviour:
+Handled by either:
+ECX
and using a conditional jump based on the zero flag. (i.e. extend 'loop' address to 32-bit) or
+loop
function in the opposite direction. The strategy used depends on the original instruction.
+Original: (E2 FA
)
+- loop -3
Relocated: (50 E2 02 EB 0C 48 B8 FD 0F 00 80 00 00 00 00 FF E0
)
+- push rax
+- loop +2
+- jmp 0x11
+- movabs rax, 0x80000ffd
+- jmp rax
// Parameters for test case:
+// - Original Code (Hex)
+// - Original Address
+// - New Address
+// - New Expected Code (Hex)
+#[case::loop_backward_abs("50e2fa", 0x80001000, 0, "50e202eb0c48b8fd0f008000000000ffe0")]
+
Instructions such as JCXZ
, JECXZ
, JRCXZ
.
Behaviour:
+IMM32
encoding. TEST
instruction followed by a conditional jump. Original: (E3 FA
)
+- jrcxz -3
Relocated: (E3 02 EB 0C 48 B8 FD 0F 00 80 00 00 00 00 FF E0
)
+- jrcxz +5
+- jmp 0x11
+- mov rax, 0x80000ffd
+- jmp rax
// Parameters for test case:
+// - Original Code (Hex)
+// - Original Address
+// - New Address
+// - New Expected Code (Hex)
+#[case::jrcxz_abs("e3fa", 0x80001000, 0, "e302eb0c48b8fd0f008000000000ffe0")]
+
At time of writing, this covers around 2800 ‼️ instructions
+Only around a 100 are covered by unit tests though.
+Covers all instructions which have an IP relative operand, i.e. read/write to a memory address +which is relative to the address of the next instruction.
+Behaviour:
+Replace RIP relative operand with a scratch register with the originally intended memory address.
+Original: (48 8B 1D 08 00 00 00
)
+- mov rbx, [rip + 8]
Relocated: (48 B8 0F 00 00 00 01 00 00 00 48 8B 18
)
+- mov rax, 0x10000000f
+- mov rbx, [rax]
// Parameters for test case:
+// - Original Code (Hex)
+// - Original Address
+// - New Address
+// - New Expected Code (Hex)
+#[case::mov_rhs("488b1d08000000", 0x100000000, 0, "48b80f00000001000000488b18")]
+
reloaded-hooks-rs
uses the iced library under the hood for
+assembly and disassembly.
In iced, operands can be broken down to 3 main types:
+Name | +Note | +
---|---|
register | +Including Vector Registers | +
memory | +i.e. [rax] or [rip + 4] |
+
imm | +Immediate, 8/16/32/64 | +
Immediates use multiple types, e.g. Immediate8
, Immediate16
etc. but on assembler side you can pass them all as Immediate32, so you can group them.
Each instruction can have 0-5 operands, where there is at max 1 operand which can be RIP relative.
+To handle this, a script projects/code-generators/x86/generate_enum_ins_combos.py
was used to dump
+all possible operand permutations from Iced
source. Then I wrote functions to handle each possible permutation.
1 Operand:
+2 Operands:
+3 Operands:
+4 Operands:
+5 Operands:
+If reloaded-hooks-rs
encounters an instruction with RIP relative operand that uses any of the
+following operand permutations, it should successfully patch it.
This is just a quick reference sheet for developers.
+Register | +stdcall (Microsoft x86) | +cdecl | +
---|---|---|
eax |
+Caller-saved, return value | +Caller-saved, return value | +
ebx |
+Callee-saved | +Callee-saved | +
ecx |
+Caller-saved | +Caller-saved | +
edx |
+Caller-saved | +Caller-saved | +
esi |
+Callee-saved | +Callee-saved | +
edi |
+Callee-saved | +Callee-saved | +
ebp |
+Callee-saved | +Callee-saved | +
esp |
+Callee-saved | +Callee-saved | +
For floating point registers:
+Register | +stdcall (Microsoft x86) | +cdecl | +
---|---|---|
st(0) -st(7) |
+Caller-saved, st(0) used for returning floating point values. |
+Caller-saved, st(0) used for returning floating point values. |
+
mm0 -mm7 |
+Caller-saved | +Caller-saved | +
xmm0 -xmm7 |
+Caller-saved | +Caller-saved | +
Both calling conventions pass function parameters on the stack, in right-to-left order, and they
+both return values in eax
. For floating-point values or larger structures, the FPU stack or
+additional conventions are used. The main difference for function calls is that stdcall expects
+the function (callee) to clean up the stack, while cdecl expects the caller to do it.
It is recommended library users manually specify conventions in their hook functions."
+When the calling convention of <your function>
is not specified, wrapper libraries must insert
+the appropriate default convention in their wrappers.
i686-pc-windows-gnu
: cdecli686-pc-windows-msvc
: cdecli686-unknown-linux-gnu
: SystemVLinux x86
: SystemVWindows x86
: cdeclThis is just a quick reference sheet for developers.
+The order of the registers is typically as follows for Microsoft x64 ABI: rcx
, rdx
, r8
, r9
,
+then the rest of the parameters are pushed onto the stack in reverse order (right-to-left).
For the System V ABI on x64: rdi
, rsi
, rdx
, rcx
, r8
, r9
, then the rest of the parameters
+are pushed onto the stack in reverse order (right-to-left).
Register | +Microsoft x64 ABI | +SystemV ABI | +
---|---|---|
rax |
+Caller-saved | +Caller-saved | +
rbx |
+Callee-saved | +Callee-saved | +
rcx |
+Caller-saved, 1st parameter | +Caller-saved, 4th parameter | +
rdx |
+Caller-saved, 2nd parameter | +Caller-saved, 3rd parameter | +
rsi |
+Caller-saved | +Caller-saved, 2nd parameter | +
rdi |
+Caller-saved | +Caller-saved, 1st parameter | +
rbp |
+Callee-saved | +Callee-saved | +
rsp |
+Callee-saved | +Callee-saved | +
r8 |
+Caller-saved, 3rd parameter | +Caller-saved, 5th parameter | +
r9 |
+Caller-saved, 4th parameter | +Caller-saved, 6th parameter | +
r10 |
+Caller-saved | +Caller-saved | +
r11 |
+Caller-saved | +Caller-saved | +
r12 |
+Callee-saved | +Callee-saved | +
r13 |
+Callee-saved | +Callee-saved | +
r14 |
+Callee-saved | +Callee-saved | +
r15 |
+Callee-saved | +Callee-saved | +
Floating Point Registers (Microsoft)
+Register | +Microsoft x64 ABI | +
---|---|
st(0) -st(7) |
+Caller-saved | +
mm0 -mm7 |
+Caller-saved | +
xmm0 -xmm5 |
+Caller-saved, used for floating point parameters. | +
ymm0 -zmm5 |
+Caller-saved, used for floating point parameters. | +
zmm0 -zmm5 |
+Caller-saved, used for floating point parameters. | +
xmm6 -xmm15 |
+Callee-saved. | +
ymm6 -ymm15 |
+Callee-saved. Upper half must be preserved by the caller | +
zmm6 -zmm31 |
+Callee-saved. Upper half must be preserved by the caller | +
Floating Point Registers (SystemV)
+Register | +SystemV ABI | +
---|---|
st(0) -st(7) |
+Caller-saved | +
mm0 -mm7 |
+Caller-saved | +
xmm0 -xmm7 |
+Caller-saved, used for floating point parameters | +
ymm0 -zmm7 |
+Caller-saved, used for floating point parameters | +
zmm0 -zmm7 |
+Caller-saved, used for floating point parameters | +
xmm8 -xmm15 |
+Caller-saved | +
ymm8 -ymm15 |
+Caller-saved, used for floating point parameters | +
zmm8 -zmm31 |
+Caller-saved, used for floating point parameters | +
On Linux, syscalls use R10 instead of RCX in SystemV ABI
+Information sourced from Source.
+Future Intel processors are expected to ship with APX, extending the registers to 32 by adding R16-R31.
+These future registers are expected to be caller saved.
+To quote document:
+++Defining all new state (Intel® APX’s EGPRs) as volatile (caller-saved or scratch)
+
It is recommended library users manually specify conventions in their hook functions."
+When the calling convention of <your function>
is not specified, wrapper libraries must insert
+the appropriate default convention in their wrappers.
x86_64-pc-windows-gnu
: Microsoftx86_64-pc-windows-msvc
: Microsoftx86_64-unknown-linux-gnu
: SystemVx86_64-apple-darwin
: SystemVWindows x64
: MicrosoftLinux x64
: SystemVmacOS x64
: SystemVReplacing arbitrary assembly sequences (a.k.a. 'mid function hooks').
+This hook is used to make small changes to existing logic, for example injecting custom logic for existing conditional branches (if
statements).
Limited effectiveness if Code Relocation is not available.
+I'm not a security person/researcher. I just make full stack game modding tools, mods and libraries. Naming in these design docs might be unconventional.
+This hook works by injecting a jmp
instruction inside the middle of an arbitrary assembly sequence
+to custom code. The person using this hook must be very careful not to break the program
+(corrupt stack, used registers, etc.).
Original Code
: Middle of an arbitrary sequence of assembly instructions where a branch
to custom code is placed. Hook Function
: Contains user code, including original code (depending on user preference). Original Stub
: Original code (used when hook disabled). flowchart TD
+ O[Original Code]
+ HK[Hook Function]
+
+ O -- jump --> HK
+ HK -- jump back --> O
+When the hook is activated, a branch
is placed in the middle of the original assembly instruction
+sequence to your hook code.
Your code (and/or original code) is then executed, then it branches back to original code.
+flowchart TD
+ O[Original Function]
+ HK["Hook Function <Overwritten with Original Code>"]
+
+ O -- jump --> HK
+ HK -- jump back --> O
+When the hook is deactivated, the 'Hook Function' is overwritten in-place with original instructions +and a jump back to your code.
+Assembly Hooks should allow both Position Independent Code and Position Relative Code
+With that in mind, the following APIs should be possible:
+/// Creates an Assembly Hook given existing position independent assembly code,
+/// and address which to hook.
+/// # Arguments
+/// * `hook_address` - The address of the function or mid-function to hook.
+/// * `asm_code` - The assembly code to execute, precompiled.
+fn from_pos_independent_code_and_function_address(hook_address: usize, asm_code: &[u8]);
+
+/// Creates an Assembly Hook given existing position assembly code,
+/// and address which to hook.
+///
+/// # Arguments
+/// * `hook_address` - The address of the function or mid-function to hook.
+/// * `asm_code` - The assembly code to execute, precompiled.
+/// * `code_address` - The original address of asm_code.
+///
+/// # Remarks
+/// Code in `asm_code` will be relocated to new target address.
+fn from_code_and_function_address(hook_address: usize, asm_code: &[u8], code_address: usize);
+
+/// Creates an Assembly Hook given existing position assembly code,
+/// and address which to hook.
+///
+/// # Arguments
+/// * `hook_address` - The address of the function or mid-function to hook.
+/// * `asm_isns` - The assembly instructions to place at this address.
+///
+/// # Remarks
+/// Code in `asm_code` will be relocated to new target address.
+fn from_instructions_and_function_address(hook_address: usize, asm_isns: &[Instructions]);
+
Using overloads for clarity, in library all options should live in a struct.
+Code using from_code_and_function_address
is to be preferred for usage, as users will be able to use
+relative branches for improved efficiency. (If they are out of range, hooking library will rewrite them)
For pure assembly code, users are expected to compile code externally using something like FASM
,
+put the code in their program/mod (as byte array) and pass that directly as asm_code
.
For people who want to call their own program/mod(s) from assembly, there will be a wrapper API around
+Jit<TRegister>
and its various Operations. This API will be cross-architecture and
+should contain all the necessary operations required for setting up stack/registers and calling user code.
Programmers are also expected to provide 'max allowed hook length' with each call.
+The expected hook lengths for each architecture
+When using the library, the library will use the most optimal possible jmp
instruction to get to the user hook.
When calling one of the functions to create an assembly hook, the end user should specify their max permissible assembly hook length.
+If a hook cannot be satisfied within that constraint, then library will throw an error.
+The following table below shows common hook lengths, for:
+Relative Jump
(best case) Relative Jump
range. Architecture | +Relative | +TMA | +Worst Case | +
---|---|---|---|
x86[1] | +5 bytes (+- 2GiB) | +5 bytes | +5 bytes | +
x86_64 | +5 bytes (+- 2GiB) | +6 bytes[2] | +13 bytes[3] | +
x86_64 (macOS) | +5 bytes (+- 2GiB) | +13 bytes[4] | +13 bytes[3] | +
ARM64 | +4 bytes (+- 128MiB) | +12 bytes[6] | +20 bytes[5] | +
ARM64 (macOS) | +4 bytes (+- 128MiB) | +12 bytes[6] | +20 bytes[5] | +
[1]: x86 can reach any address from any address with relative branch due to integer overflow/wraparound.
+[2]: jmp [<Address>]
, with <Address> at < 2GiB.
+[3]: mov <reg>, address
+ call <reg>
. +1 if using an extended reg.
+[4]: macOS restricts access to < 2GiB
memory locations, so absolute jump must be used. +1 if using an extended reg.
+[5]: MOVZ + MOVK + LDR + BR.
+[6]: ADRP + ADD + BR.
Common: Thread Safety & Memory Layout
+As reloaded-hooks-rs
intends to replace Reloaded.Hooks
is must provide certain functionality for backwards compatibility.
Once reloaded-hooks-rs
releases, the legacy Reloaded.Hooks
will be a wrapper around it.
This means a few functionalities must be supported here:
+Setting arbitrary 'Hook Length'.
+Reloaded.Hooks
users create an ASM Hook (with default PreferRelativeJump == false
and HookLength == -1
) the wrapper for legacy API must set 'Hook Length' == 7
to emulate absolute jump size.MaxOpcodeSize
from original API should be sufficient.Supporting Assembly via FASM.
+Reloaded.Hooks
wrapper will continue to ship FASM for backwards compatibility, however mods are expected to migrate to the new library in the future.Assembly hook info is packed by default to save on memory space. By default, the following limits apply:
+Property | +4 Byte Instruction (e.g. ARM64) | +Other (e.g. x86) | +
---|---|---|
Max Orig Code Length | +128KiB | +32KiB | +
Max Hook Code Length | +128KiB | +32KiB | +
These limits may increase in the future if additional functionality warrants extending metadata length.
+Replaces a branch
(call/jump) to an existing method with a new one.
This hook is commonly used when you want to change behaviour of a function, but only for certain callers.
+For example, if you have a method Draw2DElement
that's used to draw an object to the screen, but
+you only want to move a certain element that's rendered by Draw2DElement
, you would use a Branch Hook
+to replace call Draw2DElement
to call YourOwn2DElement
.
Only guaranteed to work on platforms with Targeted Memory Allocation
+Because the library needs to be able to acquire memory in proximity of the original function.
+Usually this is almost always achievable, but cases where Denuvo DRM inflates ARM64 binaries +(20MB -> 500MB) may prove problematic as ARM64 has +-128MiB range for relative jumps.
+I'm not a security person/researcher. I just make full stack game modding tools, mods and libraries. Naming in these design docs might be unconventional.
+This hook works by replacing the target of a call
(a.k.a. Branch with Link) instruction with a new target.
A Branch Hook is really a specialised variant of function hook.
+Notably it differs in the following ways:
+There is no Wrapper To Call Original Function as no instructions are stolen.
+You call
the ReverseWrapper instead of jump
ing to it.
Caller Function
: Function which originally called Original Method
. ReverseWrapper
: Translates from original function calling convention to yours. Then calls your function. <Your Function>
: Your Rust/C#/C++/Asm code.Original Method
: Original method to be called. flowchart TD
+ CF[Caller Function]
+ RW[Stub]
+ HK["<Your Function>"]
+ OM[Original Method]
+
+ CF -- "call wrapper" --> RW
+ RW -- jump to your code --> HK
+ HK -. "Calls <Optionally>" .-> OM
+ OM -. "Returns" .-> HK
+'Fast Mode' is an optimisation that inserts the jmp to point directly into your code when possible.
+flowchart TD
+ CF[Caller Function]
+ HK["<Your Function>"]
+ OM[Original Method]
+
+ CF -- "call 'Your Function' instead of original" --> HK
+ HK -. "Calls <Optionally>" .-> OM
+ OM -. "Returns" .-> HK
+This option allows for a small performance improvement, saving 1 instruction and some instruction prefetching load.
+This is on by default (can be disabled), and will take into effect when no conversion between calling conventions is needed.
+flowchart TD
+ CF[Caller Function]
+ RW[ReverseWrapper]
+ HK["<Your Function>"]
+ W[Wrapper]
+ OM[Original Method]
+
+ CF -- "call wrapper" --> RW
+ RW -- jump to your code --> HK
+ HK -. "Calls <Optionally>" .-> W
+ W -- "call original (wrapped)" --> OM
+ OM -. "Returns" .-> W
+ W -. "Returns" .-> HK
+flowchart TD
+ CF[Caller Function]
+ SB[Stub]
+ HK[Hook Function]
+ OM[Original Method]
+
+ CF -- jump to stub --> SB
+ SB -- jump to original --> OM
+When the hook is deactivated, the stub is replaced with a direct jump back to the original function.
+By bypassing your code entirely, it is safe for your dynamic library (.dll
/.so
/.dylib
)
+to unload from the process.
Common: Thread Safety & Memory Layout
+The 'branch hook' stub uses the following memory layout:
+- [Branch to Hook Function / Branch to Original Function]
+- Branch to Hook Function
+- Branch to Original Function
+
If calling convention conversion is needed, the layout looks like this:
+- [ReverseWrapper / Branch to Original Function]
+- ReverseWrapper
+- Branch to Original Function
+- Wrapper
+
The library is optimised to not use redundant memory
+For example, in x86 (32-bit), a jmp
instruction can reach any address from any address. In that situation,
+we don't write Branch to Original Function
to the buffer at all, provided a ReverseWrapper
is not needed,
+as it is not necessary.
Using x86 Assembly.
+originalCaller:
+ ; Some code...
+ call originalFunction
+ ; More code...
+
originalCaller:
+ ; Some code...
+ call userFunction ; To user method
+ ; More code...
+
+userFunction:
+ ; New function implementation...
+ call originalFunction ; Optional.
+
; x86 Assembly
+originalCaller:
+ ; Some code...
+ call stub
+ ; More code...
+
+stub:
+ ; == BranchToHook ==
+ jmp newFunction
+ ; == BranchToHook ==
+
+ ; == BranchToOriginal ==
+ jmp originalFunction
+ ; == BranchToOriginal ==
+
+newFunction:
+ ; New function implementation...
+ call originalFunction ; Optional.
+
; x86 Assembly
+originalCaller:
+ ; Some code...
+ call stub
+ ; More code...
+
+stub:
+ ; == ReverseWrapper ==
+ ; implementation..
+ call userFunction
+ ; ..implementation
+ ; == ReverseWrapper ==
+
+ ; == Wrapper ==
+ ; implementation ..
+ jmp originalFunction
+ ; .. implementation
+ ; == Wrapper ==
+
+ ; == BranchToOriginal ==
+ jmp originalFunction ; Whenever disabled :wink:
+ ; == BranchToOriginal ==
+
+userFunction:
+ ; New function implementation...
+ call wrapper; (See Above)
+
; x86 Assembly
+originalCaller:
+ ; Some code...
+ call stub
+ ; More code...
+
+stub:
+ <jmp to `jmp originalFunction`> ; We disable the hook by branching to instruction that branches to original
+ jmp originalFunction ; Whenever disabled :wink:
+
+newFunction:
+ ; New function implementation...
+ call originalFunction ; Optional.
+
+originalFunction:
+ ; Original function implementation...
+
Design notes common to all hooking strategies.
+Wrappers are stubs which convert from the calling convention of the original function to your calling convention.
+If the calling convention of the hooked function and your function matches, this wrapper is simply just 1 jmp
instruction.
Wrappers are documented in their own page here.
+Stub which converts from your code's calling convention to original function's calling convention
+This is basically Wrapper with source
and destination
swapped around
Hooks in reloaded-hooks-rs
are structured in a very specific way to ensure thread safety.
They sacrifice a bit of memory usage in favour of performance + thread safety.
+Most hooks, regardless of type have a memory layout that looks something like this:
+// Size: 2 registers
+pub struct Hook
+{
+ /// The address of the stub containing bridging code
+ /// between your code and custom code. This is the address
+ /// of the code that will actually be executed at runtime.
+ stub_address: usize,
+
+ /// Address of the 'properties' structure, containing
+ /// the necessary info to manipulate the data at stub_address
+ props: NonNull<StubPackedProps>,
+}
+
Notably, there are two heap allocations. One at stub_address
, which contains the executable code,
+and one at props
, which contains packed info of the stub at stub_address
.
The hooks use a 'swapping' system. Both stub_address
and props
contains swap space
. When you
+enable or disable a hook, the data in the two 'swap spaces' are swapped around.
In other words, when stub_address
' 'swap space' contains the code for HookFunction
(hook enabled),
+the 'swap space' at props
' contains the code for Original Code
.
Thread safety is ensured by making writes within the stub itself atomic, as well as making the emplacing +of the jump to the stub in the original application code atomic.
+The memory region containing the actual executed code.
+The stub has two possible layouts, if the Swap Space
is small enough such that it can be atomically
+overwritten, it will look like this:
- 'Swap Space' [HookCode / OriginalCode]
+<pad to atomic register size>
+
Otherwise, if Swap Space
cannot be atomically overwritten, it will look like:
- 'Swap Space' [HookCode / OriginalCode]
+- HookCode
+- OriginalCode
+
Some hooks may store, extra data after OriginalCode
.
For example, if calling convention conversion is needed, the HookCode
becomes a
+ReverseWrapper, and the stub will also contain a Wrapper.
If calling convention conversion is needed, the layout looks like this:
+- 'Swap Space' [ReverseWrapper / OriginalCode]
+- ReverseWrapper
+- OriginalCode
+- Wrapper
+
Using ARM64 Assembly Hook as an example.
+If the 'OriginalCode' was:
+mov x0, x1
+add x0, x2
+
And the 'HookCode' was:
+add x1, x1
+mov x0, x2
+
The memory would look like this when hook is enabled.
+swap: ; Currently Applied (Hook)
+ mov x0, x1
+ add x0, x2
+ b back_to_code
+
+hook: ; HookCode
+ add x1, x1
+ mov x0, x2
+ b back_to_code
+
+original: ; OriginalCode
+ mov x0, x1
+ add x0, x2
+ b back_to_code
+
(When sizeof(swap)
is larger than biggest possible atomic write.)
Each Assembly Hook contains a pointer to the heap stub (seen above) and a pointer to the heap.
+The heap contains all information required to perform operations on the stub.
+- StubPackedProps
+ - Enabled Flag
+ - IsSwapOnly
+ - SwapSize
+ - HookSize
+- [Hook Function / Original Code]
+
The data in the heap contains a short `StubPackedProps`` struct, detailing the data stored over in the +stub.
+The SwapSize
contains the length of the 'swap' info (and also consequently, offset of HookCode
).
+The HookSize
contains the length of the 'hook' instructions (and consequently, offset of OriginalCode
).
If the IsSwapOnly
flag is set, then this data is to be atomically overwritten.
When transitioning between Enabled/Disabled state, we place a temporary branch at entry
, this allows us to manipulate the remaining code safely.
Using ARM64 Assembly Hook as an example.
+We start the 'disable' process with a temporary branch:
+entry: ; Currently Applied (Hook)
+ b original ; Temp branch to original
+ mov x0, x2
+ b back_to_code
+
+hook: ; Backup (Hook)
+ add x1, x1
+ mov x0, x2
+ b back_to_code
+
+original: ; Backup (Original)
+ mov x0, x1
+ add x0, x2
+ b back_to_code
+
Don't forget to clear instruction cache on non-x86 architectures which need it.
+This ensures we can safely overwrite the remaining code...
+Then we overwrite entry
code with hook
code, except the branch:
entry: ; Currently Applied (Hook)
+ b original ; Branch to original
+ add x0, x2 ; overwritten with 'original' code.
+ b back_to_code ; overwritten with 'original' code.
+
+hook: ; Backup (Hook)
+ add x1, x1
+ mov x0, x2
+ b back_to_code
+
+original: ; Backup (Original)
+ mov x0, x1
+ add x0, x2
+ b back_to_code
+
And lastly, overwrite the branch.
+To do this, read the original sizeof(nint)
bytes at entry
, replace branch bytes with original bytes
+and do an atomic write. This way, the remaining instruction is safely replaced.
entry: ; Currently Applied (Hook)
+ add x1, x1 ; 'original' code.
+ add x0, x2 ; 'original' code.
+ b back_to_code ; 'original' code.
+
+original: ; Backup (Original)
+ mov x0, x1
+ add x0, x2
+ b back_to_code
+
+hook: ; Backup (Hook)
+ add x1, x1
+ mov x0, x2
+ b back_to_code
+
This way we achieve zero overhead CPU-wise, at expense of some memory.
+Stub info is packed by default to save on memory space. By default, the following limits apply:
+Property | +4 Byte Instruction (e.g. ARM64) | +Other (e.g. x86) | +
---|---|---|
Max Orig Code Length | +128KiB | +32KiB | +
Max Hook Code Length | +128KiB | +32KiB | +
These limits may increase in the future if additional required functionality warrants extending metadata length.
+Thread safety is 'theoretically' not guaranteed for every possible x86 processor, however is satisfied for all modern CPUs.
+The information below is x86 specific but applies to all architectures with a non-fixed instruction size. Architectures with fixed instruction sizes (e.g. ARM) are thread safe in this library by default.
+++If the
+jmp
instruction emplaced when switching state overwrites what originally + were multiple instructions, it is theoretically possible that the placing thejmp
will make the + instruction about to be executed invalid.
For example if the previous instruction sequence was:
+0x0: push ebp
+0x1: mov ebp, esp ; 2 bytes
+
And inserting a jmp produces:
+0x0: jmp disabled ; 2 bytes
+
It's possible that the CPU's Instruction Pointer was at 0x1`` at the time of the overwrite, making the
mov ebp, esp` instruction invalid.
In practice, modern x86 CPUs (1990 onwards) from Intel, AMD and VIA prefetch instruction in batches +of 16 bytes. We place our stubs generated by the various hooks on 16-byte boundaries for this +(and optimisation) reasons.
+So, by the time we change the code, the CPU has already prefetched the instructions we are atomically +overwriting.
+In other words, it is simply not possible to perfectly time a write such that a thread at 0x1
+(mov ebp, esp
) would read an invalid instruction, as that instruction was prefetched and is being
+executed from local thread cache.
Here is a thread safety table for x86, taking the above into account:
+Safe? | +Hook | +Notes | +
---|---|---|
✅ | +Function | +Functions start on multiples of 16 on pretty much all compilers, per Intel Optimisation Guide. | +
✅ | +Branch | +Stubs are 16 aligned. | +
✅ | +Assembly | +Stubs are 16 aligned. | +
✅ | +VTable | +VTable entries are usize aligned, and don't cross cache boundaries. |
+
When a hook is already present, and you wish to stack that hook over the existing hook, certain problems might arise.
+This is notably an issue when a hook entry composes of more than 1 instruction; i.e. on RISC architectures.
+There is a potential register allocation caveat in this scenario.
+Pretend you have the following ARM64 function:
+ADD x1, #5
+ADD x2, #10
+ADD x0, x1, x2
+ADD x0, x0, x0
+RET
+
x1 = x1 + 5;
+x2 = x2 + 10;
+int x0 = x1 + x2;
+x0 = x0 + x0;
+return x0;
+
And then, a large hook using an absolute jump with register is applied:
+# Original instructions here replaced
+MOVZ x0, A
+MOVK x0, B, LSL #16
+MOVK x0, C, LSL #32
+MOVK x0, D, LSL #48
+B x0
+# <= branch returns here
+
If you then try to apply a smaller hook after applying the large hook, you might run into the following situation:
+# The 3 instructions here are an absolute jump using pointer.
+adrp x9, [0]
+ldr x9, [x9, 0x200]
+br x9
+# Call to original function returns here, back to then branch to previous hook
+MOVK x0, D, LSL #48
+B x0
+
This is problematic, with respect to register allocation. +Absolute jumps on some RISC platforms like ARM will always require the use of a scratch register.
+But there is a risk the scratch register used is the same register (x0
) as the register used by the
+previous hook as the scratch register. In which case, the jump target becomes invalid.
mov
+ branch
combinations for each target architecture.Only applies to architectures with variable length instructions. (x86)
+Some hooking libraries don't clean up remaining stolen bytes after installing a hook.
+Very notably Steam does this for rendering (overlay) and input (controller support).
+Consider the original function having the following instructions:
+48 8B C4 mov rax, rsp
+48 89 58 08 mov [rax + 08], rbx
+
After Steam hooks, it will leave the function like this
+E9 XX XX XX XX jmp 'somewhere'
+58 08 <invalid instruction. leftover from state before>
+
If you're not able to install a relative hook, e.g. need to use an absolute jump
+FF 25 XX XX XX XX jmp ['addr']
+
The invalid instructions will now become part of the 'stolen' bytes, when you call the original; +and invalid instructions may be executed.
+This library must do the following:
+relative jump
over absolute jump
) when possible. There unfortunately isn't much we can do to detect invalid instructions generated by other hooking libraries +reliably, best we can do is try to avoid it by using shorter hooks. Thankfully this is not a common issue +given most people use the 'popular' libraries.
+This feature will not be ported over from legacy Reloaded.Hooks
, until an edge case is found that requires this.
This section explains how Reloaded handles an edge case within an already super rare case.
+This topic is a bit more complex, so we will use x86 as example here.
+For any of this to be necessary, the following conditions must be true:
+The low probability of this happening, at least on Windows and/or Linux is rather insane. It cannot +be estimated, but if I were to have a guess, maybe 1 in 1 billion. You'd be more likely to die +from a shark attack.
+In any case, when this happens, Reloaded performs return address patching.
+Suppose a foreign hooking library hooks a function with the following prologue:
+55 push ebp
+89 e5 mov ebp, esp
+00 00 add [eax], al
+83 ec 20 sub esp, 32
+...
+
After hooking, this code would look like:
+E9 XX XX XX XX jmp 'somewhere'
+<= existing hook jumps back here when calling original (this) function
+83 ec 20 sub esp, 32
+...
+
When the prologue is set up 'just right', such that the existing instrucions divide perfectly
+into 5 bytes, and we need to insert a 6 byte absolute jmp FF 25
, Reloaded must patch the return address.
Reloaded has a built in patcher for this super rare scenario, which detects and attempts to patch return +addresses of the following patterns:
+Where nop* represents 0 or more nops.
+
+1. Relative immediate jumps.
+
+ nop*
+ jmp 0x123456
+ nop*
+
+2. Push + Return
+
+ nop*
+ push 0x612403
+ ret
+ nop*
+
+3. RIP Relative Addressing (X64)
+
+ nop*
+ JMP [RIP+0]
+ nop*
+
This patching mechanism is rather complicated, relies on disassembling code at runtime and thus won't be explained here.
+Different hooking libraries use different logic for storing callbacks. In some cases alignment of code (or rather lack thereof) can also make this operation unreliable, since we rely on disassembling the code at runtime to find jumps back to end of hook. The success rate of this operation is NOT 100%
+While I haven't studied the source code of other hooking libraries before, I've had no issues in the past with the common Detours and minhook libraries that are commonly used
+Libraries which can safely interoperate (stack hooks ontop) of Reloaded Hooks Hooks' must satisfy the following.
+Must be able to patch (re-adjust) relative jumps.
+Must be able to automatically determine number of bytes to steal from original function.
+See: Code Relocation
+Please read the general section first, this contains ARM64 specific stuff.
+In the case of ARM64, padding is usually down with the following sequences:
+- nop
(0xD503201F
, big endian), used by GCC.
+- and x0, x0
(0x00000000
), used by MSVC.
Getting sufficient bytes to make good use of them in ARM64 is more uncommon than x86.
+Please read the general section first, this contains x86 specific stuff.
+0x90
(GCC) or 0xCC
(MSVC) are commonly used for padding.We use x86 in the example for general section above.
+ + + + + + +This page just contains common information regarding interoperability that are common to all platforms.
+Interpoerability in this sense means 'stacking hooks ontop of other libraries', and how other libraries
+can stack hooks ontop of reloaded-hooks-rs
.
This is the general hooking strategy employed by reloaded-hooks
; derived from the facts in the rest of this document.
To ensure maximum compatibility with existing hooking systems, reloaded-hooks
uses
+relative jumps as these are the most popular,
+and thus best supported by other libraries when it comes to hook stacking.
These are the lowest overhead jumps, so are preferable in any case.
+In the very, very, unlikely event that using (target is further than
+max relative jump distance
), the following strategy below is used.
If no existing hook exists, an absolute jump will be used (if possible).
+- Prefer indirect absolute jump (if possible).
We check for presence of 'existing hook' by catching some common instruction patterns.
+If we have any allocated buffer in range, insert relative jump, + and inside wrapper/stub use absolute jump if needed.
+Otherwise (if possible), use available free space from function alignment.
+Otherwise use absolute jump.
+In order to optimize the code relocation process, reloaded-hooks
,
+will try to find a buffer that's within relative jump range to the original jump target.
If this is not possible, reloaded-hooks
will start rewriting relative jump(s)
+from the original function to absolute jump(s) in the presence
+of recognised patterns; if the code rewriter supports this.
Strategies used for improving interoperability with other hooks.
+This is a strategy for encoding absolute jumps using fewer instructions.
+Processors typically fetch instructions 16 byte boundaries.
+To optimise for this, compilers pad the space between end of last function and start of next.
+We can exploit this 😉
+If there's sufficient padding before the function, we can:
+- Insert our absolute jump there, and branch to it.
+or
+- Insert jump target there, and branch using that jump target.
How hooking around entire functions works.
+This hook is used to run custom callback for a function, modify its parameters or replace a function entirely. It is the most common hook.
+I'm not a security person/researcher. I just make full stack game modding tools, mods and libraries. Naming in these design docs might be unconventional.
+This hook works by injecting a jmp
instruction at the beginning of a function to a custom
+replacement function, or a stub which will later call that function.
When the original function is called, it is done via a wrapper, which restores the originally
+overwritten instructions that were sacrificed for the jmp
.
Stolen Bytes
: Bytes used by instructions sacrificed in original function to place a 'jmp' to the ReverseWrapper
. ReverseWrapper
: Translates from original function calling convention to yours. Then calls your function. <Your Function>
: Your Rust/C#/C++/Asm code. Wrapper
: Translates from your calling convention to original, then runs the original function. flowchart TD
+ orig[Original Function] -- jump to wrapper --> rev[Reverse Wrapper]
+ rev -- jump to your code --> target["<Your Function>"]
+ target -- "call original via wrapper" --> stub["Wrapper <with stolen bytes + jmp to original>"]
+ stub -- "call original" --> original["Original Function"]
+
+ original -- "return value" --> stub
+ stub -- "return value" --> target
+When the hook is activated, a stub calls into your function; which becomes the 'new original function';
+that is, control will return (ret
) to the original function's caller from this function.
When your function calls the original function, it will be an entirely separate method call.
+Your function can technically not call the original and replace it outright.
+'Fast Mode' is an optimisation that inserts the jmp
to point directly into your code when possible.
flowchart TD
+ orig[Original Function] -- to your code --> target["<Your Function>"]
+ target -- "call original via wrapper" --> stub["Wrapper <with stolen bytes + jmp to original>"]
+ stub -- "call original" --> original["Original Function"]
+
+ original -- "return value" --> stub
+ stub -- "return value" --> target
+This option allows for a small performance improvement, saving 1 instruction and some instruction prefetching load.
+This is on by default (can be disabled), and will take into effect when no conversion between calling conventions is needed.
+When conversion is needed, the logic will default back to When Activated.
+When 'Fast Mode' is enabled, you lose the ability to unhook (for compatibility reasons).
+Does not apply to 'Fast Mode'. When in fast mode, deactivation returns error.
+flowchart TD
+ orig[Original Function] -- jump to wrapper --> stub["Stub <stolen bytes + jmp>"]
+ stub -- "jmp original" --> original["Original Function"]
+When you deactivate a hook, the contents of 'Reverse Wrapper' are overwritten with the stolen bytes.
+When 'Reverse Wrapper' is allocated, extra space is reserved for original code.
+By bypassing your code entirely, it is safe for your dynamic library (.dll
/.so
/.dylib
)
+to unload from the process.
It is recommended library users manually specify conventions in their hook functions."
+When the calling convention of <your function>
is not specified, wrapper libraries must insert
+the appropriate default convention in their wrappers.
On Linux, syscalls use R10 instead of RCX in SystemV ABI
+i686-pc-windows-gnu
: cdecli686-pc-windows-msvc
: cdecli686-unknown-linux-gnu
: SystemV (x86)
x86_64-pc-windows-gnu
: Microsoft x64
x86_64-pc-windows-msvc
: Microsoft x64x86_64-unknown-linux-gnu
: SystemV (x64)x86_64-apple-darwin
: SystemV (x64)Windows x86
: cdeclWindows x64
: Microsoft x64
Linux x64
: SystemV (x64)
Linux x86
: SystemV (x86)
macOS x64
: SystemV (x64)
Wrappers are stubs which convert from the calling convention of the original function to your calling convention.
+If the calling convention of the hooked function and your function matches, this wrapper is simply just 1 jmp
instruction.
Wrappers are documented in their own page here.
+Stub which converts from your code's calling convention to original function's calling convention
+This is basically Wrapper with source
and destination
swapped around
Replaces a pointer inside an array of function pointers with a new pointer.
+This hook is commonly used to hook COM
objects, e.g. Direct3D
.
I'm not a security person/researcher. I just make full stack game modding tools, mods and libraries. Naming in these design docs might be unconventional.
+Probably the simplest hook out of them all, it's simply replacing one pointer inside an array of function +pointers with a new one.
+VTables, are what is used to support polymorphism in C++ and similar languages.
+They are the mechanism that enables calling correct functions in presence of inheritance and virtual functions.
+Basically what drives 'interfaces' in other languages.
+In both GCC and Visual C++, VTables are automatically created for classes that have virtual functions.
+They are located at offset 0x0 of any class, thus if you get a pointer to a class, and dereference offset +0x0, you'll be at the address of the first item in the VTable.
+class Item {
+ virtual void doSomething();
+ int k;
+};
+
class Item
+ void* vTable
+ int k
+
vTable:
+ void* doSomething
+
VTables exist in .rdata
, thus you need to change memory permissions when hooking them.
One notable thing about COM is that all interfaces inherit from IUnknown,
+so the first 4 methods will always be the 4 methods of IUnknown
.
Using Direct3D9 as an example
+flowchart LR
+ EndScene --> EndScene_Orig
+ Clear --> Clear_Orig
+ SetTransform --> SetTransform_Orig
+ GetTransform --> GetTransform_Orig
+flowchart LR
+ EndScene --> EndScene_Hook --> Your_Function --> EndScene_Orig
+ Clear --> Clear_Orig
+ SetTransform --> SetTransform_Orig
+ GetTransform --> GetTransform_Orig
+
+
+
+
+
+
+ Describes how stubs for converting between different Calling Conventions (ABIs) are generated.
+This page uses x86 as an example, however the same concepts apply to other architectures.
+These stubs are what allows Reloaded.Hooks-rs
to hook functions which take parameters in custom registers,
+allowing developers to skip writing error prone 'naked'
functions by hand.
Setting frame pointer (ebp
) is not necessary, as our wrapper shouldn't use it
# push LR if present on platform
+push ebp
+push ebx
+push edi
+push esi
+
Setup Function Parameters
+# In a loop
+push dword [ebp + {baseStackOffset}]
+
Reserve Extra Stack Space
+Some calling conventions require extra space reserved up front
+sub esp, {whatever}
+
If target function returns in different register than caller expects, might need to for example mov eax, ecx
.
mov eax, ecx
+
# Restore non-volatile registers
+pop esi
+pop edi
+pop ebx
+pop ebp
+# pop LR if relevant on given platform
+
The general implementation for 64-bit is the same, however the stack must be 16 byte aligned at method entry, and for MSFT convention, 32 bytes reserved on stack before call
+There are also some very minor nuances, which the actual code has to handle, but this is the general +jist of it.
+This optimizes CPU instruction fetch, which (on x86) operates on 16 byte boundaries.
+So we align our wrappers to these boundaries.
+When there are overlaps in callee saved registers between source and target, we can skip backing up those registers.
+For example, cdecl
and stdcall
use the same callee saved registers, ebp
, ebx
, esi
, edi
. When converting between these two conventions, it is not necessary to backup/restore any of them in the wrapper, because the target function will already take care of that.
Example: cdecl target -> stdcall
wrapper.
# Stack Backup
+push ebp
+mov ebp, esp
+
+# Callee Save
+push ebx
+push edi
+push esi
+
+# Re push parameters
+push dword [ebp + {x}]
+push dword [ebp + {x}]
+
+call {function}
+add esp, 8
+
+# Callee Restore
+pop esi
+pop edi
+pop ebx
+
+# Stack Restore
+pop ebp
+ret 8
+
# Stack Backup
+push ebp
+mov ebp, esp
+
+# Re push parameters
+push dword [ebp + {x}]
+push dword [ebp + {x}]
+
+call {function}
+add esp, 8
+
+# Stack Restore
+pop ebp
+ret 8
+
Pseudocode example. Not verified for accuracy, but it shows the idea
+In the cdecl -> stdcall
example, ebp
is considered a callee saved register too, thus it should be possible to optimise into:
# Re push parameters
+push dword [esp + {x}]
+push dword [esp + {x}]
+
+call {function}
+add esp, 8
+
+ret 8
+
When pushing multiple registers at once, it is possible to remove redundant stack operations.
+Imagine a situation where you need to push 3 float registers onto the stack; if we pass the instructions +from the wrapper generator verbatim [push a, then push b, then push c], we would land with the following:
+; Push XMM registers
+sub rsp, 16
+movdqu [rsp], xmm0
+sub rsp, 16
+movdqu [rsp], xmm1
+sub rsp, 16
+movdqu [rsp], xmm2
+
+; Pop XMM registers
+movdqu xmm2, [rsp]
+add rsp, 16
+movdqu xmm1, [rsp]
+add rsp, 16
+movdqu xmm0, [rsp]
+add rsp, 16
+
This is unoptimal as it can be simplified to:
+# Push Registers to the Stack
+sub rsp, 48
+movdqu [rsp], xmm0
+movdqu [rsp + 16], xmm1
+movdqu [rsp + 32], xmm2
+
+# Pop three XMM registers from the Stack
+movdqu xmm0, [rsp]
+movdqu xmm1, [rsp + 16]
+movdqu xmm2, [rsp + 32]
+add rsp, 48
+
When generating wrappers, the generator must recognise this pattern, and merge multiple +push/pop operations into a single block, wherever possible.
+It is optimal to access memory sequentially from lowest to highest address.
+In some cases it's possible to mov between registers, rather than doing an explicit push+pop operation
+Suppose you have a custom target -> stdcall
wrapper. Custom is defined as int@eax FastAdd(int a@eax, int b@ecx)
.
Normally wrapper generation will convert the arguments like this:
+# Re-push STDCALL arguments to stack
+push dword [esp + {x}]
+push dword [esp + {x}]
+
+# Pop into correct registers
+pop eax
+pop ecx
+
+# Call that function
+# ...
+
There's opportunities for optimisation here; notably you can do:
+# Pop into correct registers
+mov eax, [esp + {x}]
+mov ecx, [esp + {x}]
+
+# Call that function
+# ...
+
Optimising cases where the source/from convention, e.g. custom target -> stdcall
has no register
+parameters is trivial, since you can directly mov into the intended target register. And this is
+the most common use case in x86.
For completeness, it should be noted that in the opposite direction stdcall target -> custom
, such
+as one that would be used in entry point of a hook (ReverseWrapper),
+no optimisation is needed here, as all registers are directly pushed without any extra steps.
In the backend, the wrapper generator keeps track of current stack pointer (assuming start is '0'); and uses that information to match the push and pop operations accordingly 😉
+In x64, and more advanced x86 scenarios where both to/from calling convention have register parameters, mov optimisation is not trivial.
+Suppose you have a a function to add 'health' to a character that's in a struct or class. i.e. int AddHealth(Player* this, int amount)
.
+(Note: The 'this' parameter to struct instance is implicit and added during compilation.)
class Player {
+ int mana;
+ int health;
+
+ void AddHealth(int amount) {
+ health += amount;
+ }
+};
+
add dword [rdi+4], esi
+ret
+
add dword [rcx+4], edx
+ret
+
If you were to make a SystemV target -> Microsoft
wrapper; you would have to move the two registers
+from rcx, rdx
to rdi, rsi
.
Therefore, a wrapper might have code that looks something like:
+# Push register parameters of the function being returned (right to left, reverse loop)
+push rdx
+push rcx
+
+# Pop parameters into registers of function being called
+pop rdi
+pop rsi
+
In this case, it is possible to optimise with:
+mov rdi, rcx # last push, first pop
+mov rsi, rdx # second last push, second pop
+
Provided that the wrapper correctly saves and restores callee moved registers for returned method, i.e.
+backs up RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15
, this is fine.
Or in the case of this wrapper, just RDI, RSI
(due to overlap within the 2 conventions).
The 'strategy' to generate code for this optimisation is keeping track of stack, start between push
and pop
in the ASM and pair the registers in the corresponding push
and pop
operations together, going outwards until there is no push/pop left.
This is just another example.
+Suppose we add 2 more parameters...
+class Player {
+ int mana;
+ int health;
+ int money;
+
+ void AddStats(int health, int mana, int money) {
+ this->health += health;
+ this->mana += mana;
+ this->money += money;
+ }
+};
+
add dword [rdi+4], esi # health
+add dword [rdi], edx # mana
+add dword [rdi+8], ecx # money
+ret
+
add dword [rcx+4], edx # health
+add dword [rcx], r8d # mana
+add dword [rcx+8], r9d # money
+ret
+
There is now an overlap between the registers used.
+Microsoft convention uses:
+- rcx
for self
+- rdx
for health
SystemV uses:
+- rcx
for money
+- rdx
for mana
The wrapper now does the following
+# Push register parameters of the function being returned (right to left, reverse loop)
+push rcx
+push rdx
+push rsi
+push rdi
+
+# Pop parameters into registers of function being called
+pop rcx
+pop rdx
+pop r8
+pop r9
+
mov rcx, rdi
+mov rdx, rsi
+mov r8, rdx
+mov r9, rcx
+
The optimised version of code above contains a bug.
+There is a bug because both conventions have overlapping registers, notably rcx
and rdx
. When
+you try to do mov r8, rdx
, this pushes invalid data, as rdx
was already overwritten.
In this specific case, you can reverse the order of operations, and get a correct result:
+# Reversed
+mov r9, rcx
+mov r8, rdx
+mov rdx, rsi
+mov rcx, rdi
+
However might not always be the case.
+When generating wrappers, we must perform a validation check to determine if any source register
+in mov target, source
hasn't already been overwritten by a prior operation.
In the Advanced Case we saw that it's not always possible to perform mov optimisation.
+This problem can be solved with a Directed Acyclic Graph
.
This problem can be solved in O(n)
complexity with a Directed Acyclic Graph
, where each node represents
+a register and an edge (arrow) from Node A to Node B represents a move from register A to register B.
The above (buggy) code would be represented as:
+
flowchart TD
+ RDI --> RCX
+ RSI --> RDX
+ RDX --> R8
+ RCX --> R9
+RDI writes to RCX which writes to R9, which is now invalid.
+We can determine the correct mov
order, by processing them in reverse order of their dependencies
mov r9, rcx
before mov rcx, rdi
mov r8, rdx
before mov rdx, rsi
Exact order encoded depends on algorithm implementation in code; as long as the 2 derived rules are followed.
+Suppose we have 2 calling conventions with reverse parameter order. For this example we will define
+convention 🐱call
. 🐱call
uses the reverse register order of Microsoft compiler.
int AddWithShift(int a, int b) {
+ return (a * 16) + b;
+}
+
shl ecx, 4
+lea eax, dword [rdx+rcx]
+ret
+
shl edx, 4
+lea eax, dword [rcx+rdx]
+ret
+
The ASM to do the calling convention transformation becomes:
+# Push register parameters of the function being returned (right to left, reverse loop)
+push rcx
+push rdx
+
+# Pop parameters into registers of function being called
+pop rcx
+pop rdx
+
mov rcx, rdx
+mov rdx, rcx
+
There is now a cycle.
+flowchart TD
+ RCX --> RDX
+ RDX --> RCX
+In this trivial example, you can use xchg
or 3 mov
(s) to swap between the two registers.
xchg rcx, rdx
+
mov {temp}, rdx
+mov rdx, rcx
+xor rcx, {temp}
+
On some Intel architectures, the mov
approach can reportedly be faster, however, it's not possible
+to procure a scratch register in all cases.
I'll welcome any PRs that detect and write the more optimal choice on a given architecture, however this is not planned for main library.
+Adding instructions also means the wrapper might overflow to the next multiple of 16 bytes, causing +more instructions to be fetched when it otherwise won't happend with xchg, potentially losing any +benefits gained on those architectures.
+The mappings done in Reloaded.Hooks
are a 1:1 bijective mapping. Therefore any cycle of just 2 registers can be resolved by simply swapping the involved registers.
Now imagine doing a mapping which involves 3 registers, r8
- r10
, and all registers need to be mov'd
.
flowchart TD
+ R8 --> R9
+ R9 --> R10
+ R10 --> R8
+mov R9, R8
+mov R10, R9
+mov R8, R10
+
To resolve this, we backup the register at the end of the cycle (in this case R10), disconnect it +from the first register in the cycle and resolve as normal.
+i.e. we solve for
+flowchart TD
+ R8 --> R9
+ R9 --> R10
+Then write original value of R10 into R8 after this code is converted into mov
sequences.
This can be done using the following strategies:
+mov
into scratch register. AsmHook
) prefer callee saved register which is not a parameter. push
+ pop
register. # Move value from end of cycle into caller saved register (scratch)
+mov RAX, R10
+
+# Original (after reorder)
+mov R10, R9
+mov R9, R8
+
+# Move from caller saved register into first in cycle.
+mov R8, RAX
+
# Push value from end of cycle into stack
+push R10
+
+# Original (after reorder)
+mov R10, R9
+mov R9, R8
+
+# Pop into intended place from stack
+pop R8
+
When possible to get scratch register, use mov
, otherwise use push
.
This is a theoretical idea, not implemented in library.
+Only applies to platforms like x86 return addresses on stack.
+In some cases, like converting between stdcall
and cdecl
; it might be possible to reuse the same
+parameters from the stack. Take into account the previous example:
# Re push parameters
+push dword [esp + {x}]
+push dword [esp + {x}]
+
+call {function}
+add esp, 8
+
+ret 8
+
Strictly speaking, to convert from stdcall
to cdecl
, you will only need to convert from
+caller stack cleanup to callee stack cleanup i.e. ret 8
.
In this case, re-pushing parameters is redundant, as the pushed parameters from the previous +method call are on stack and can still be re-used.
+What we can instead do, is overwrite the return address and jump to our code.
+# Pop previous return address from stack
+mov [esp], {addressPostJump} # replace return address
+jmp {function} # jump to our function
+add esp, 8 # our function returns here due to changed return address
+ret 8
+
Wrapper generation does not have understanding of any specific ABI, and as such cannot always be 100% correct in edge cases.
+Some ABIs have unconventional rules for handling edge cases.
+For example, consider the following rule used by the RISC-V ABI.
+++When primitive arguments twice the size of a pointer-word are passed on the stack, they are +naturally aligned. When they are passed in the integer registers, they reside in an aligned even-odd +register pair, with the even register holding the least-significant bits. In RV32, for example, the +function void foo(int, long long) is passed its first argument in a0 and its second in a2 and +a3. Nothing is passed in a1.
+
The wrappers cannot know or understand the intricate rules such as this that are imposed by an ABI.
+Optimized code does not suffer from this bug.
+Consider a function which spills a float register xmm0
, and an nint
(native size integer).
+A Push
is basically a sequence of sub
and then mov
.
So (pretend ASM below is valid)
+push xmm0
+push rax
+
Would become
+sub rsp, 16
+mov [rsp], xmm0
+sub rsp, 8
+mov [rsp], rax
+
This is invalid, because the contents of rax will now replace half of the xmm0
register on the stack.
+How ABIs and compilers deal with this isn't always well standardised; some only consider lower bits volatile,
+(Microsoft x64) while others don't preserve the bigger registers at all (SystemV x64).
Our strategy will be to try rearrange the stack operations to avoid this problem, starting by pushing +smaller registers first, and then larger registers, effectively creating:
+sub rsp, 8
+mov [rsp], rax
+sub rsp, 16
+mov [rsp], xmm0
+
Currently with optimizations enabled, this code compiles as:
+sub rsp, 24
+mov [rsp], xmm0
+mov [rsp + 16], rax
+
Which is valid.
+Some calling conventions, have rules where larger values (e.g. 128-bit values on x64) are split into +2 registers.
+The wrapper generator cannot generate code for these functions currently.
+ + + + + + +This page provides a list of platform specific functionality required for supporting Reloaded.Hooks-rs
.
Required
means library must have this to function. Recommended
means library may not work on some edge cases. Optional
means library can function without it. To add support for new platforms, supply the necessary function pointers in platform_functions.rs
.
Feature | +Windows | +Linux | +macOS | +
---|---|---|---|
Permission Change | +✅ | +✅ | +✅ | +
W^X Disable/Restore | +N/A | +N/A [1] | +⚠️ [2] | +
Targeted Memory Allocation | +✅ | +✅ | +✅ | +
[1] May be present depending on kernel configuration. Have not done adequate research.
+[2] Needed for Apple Silicon only.
Once you're done, submit a PR to add support for your platform.
+The library provides a platform_functions.rs
file which contains all the platform specific functions.
Implement the functions in this file for your platform. Generally you'll only need unprotect_memory
,
+though on some platforms, you may need to implement disable_write_xor_execute
and restore_write_xor_execute
+as well, depending on the platform's security policy.
For optimal performance, you should add support for your platform to reloaded-memory-buffers.
+It's recommended to use reloaded-hooks-rs
alongside reloaded-memory-buffers
. The concept of the buffers
+library is to perform allocations as close to original code as possible, allowing for more efficient code.
This requires walking memory pages. If your OS does not have a way to do this, you can in the meantime use
+the built-in DefaultBufferFactory
. On some platforms you'll also need to adjust DefaultBufferFactory::create_page_as_rx
,
+if your platform does not allow RWX allocations.
For DefaultBufferFactory
, you might need to replace mmap_rs
in get_any_buffer
to use your platform specific page allocation function.
Platform specific functionality is not unit tested as it relies on OS/system state. Instead, integration +tests are used to test the functionality.
+Find the tests for a given hook type (recommend: assembly_hook
tests) and run them on your platform.
If you can't run tests on your platform, copy them to one of your programs manually.
+Many platforms have per-page access permissions; which may prevent certain regions of memory from being modified.
+Notably for the use cases of this library, the .text
section is usually non-writeable, which
+prevents hooking app functions out of the box.
To work around this, the library will call the unprotect
function in platform_functions.rs
before
+making code changes in memory. It will then (for performance reasons) leave the memory unprotected
+for the lifetime of the process (assuming it remains unprotected).
For the common operating systems; the protect
/unprotect
functions map to the following API calls:
VirtualProtect
mprotect
Only required on Apple, opt in on Linux/Windows but haven't used in a game software in the wild.
+Info
+Some platforms enforce a security protection called 'Write XOR Execute'; where a memory page may only be marked as writeable +OR executable at any moment in time.
+To work around this, the library will call the disable_write_xor_execute
function in platform_functions.rs
+ahead of every function call. It will then call restore_write_xor_execute
after.
Info
+The process of code relocation might require that new location of the code +is within a certain region of the old code, usually 128MiB, 2GiB or 4GiB (depending on platform).
+In this case, you must walk over the memory pages of a process; and find a suitable place to allocate 😉
+ + + + + + +reloaded-hooks-rs is an enhanced port of the original Reloaded.Hooks (<= 4.3.0) to Rust.
+This library is written as no_std
. Currently support for Windows
, Linux
and macOS
is provided
+out of the box. That said, a lot of functionality is platform & architecture agnostic, hopefully making
+porting easier.
Platform | +x86 | +x86_64 | +
---|---|---|
Windows | +✔️ | +✔️ | +
Linux | +✔️ | +✔️ | +
macOS | +N/A * | +✔️ | +
The reloaded-hooks-rs
code is not hardwired to any platform. For other platforms you can fill the
+[pending] struct and provide appropriate function pointers; which would possibly make the library work
+even in bare metal or embedded environments.
Lists the currently available library features for different architectures.
+Feature | +x86 & x64 | +ARM64 | +
---|---|---|
Basic Function Hooking | +✅ | +✅ | +
Code Relocation | +✅* | +✅ | +
Hook Stacking | +✅ | +✅ | +
Calling Convention Wrapper Generation | +✅ | +✅ | +
Optimal Wrapper Generation | +✅ | +✅ | +
Length Disassembler | +✅ | +✅ | +
Bootstrapping a new architecture is not a difficult job!!
+Please see Architecture Support Overview for porting guidance.
__stdcall
-> __fastcall
). IP relocation is a thread safety technique employed by some libraries whereby all process' threads are stopped and any threads that are executing the prolog of the function that is being detoured at the same time have their instruction pointer overwritten to the hook.
+This can only be done on some OSes that expose the relevant APIs.
+For the project author's use case, this is not needed, however the project would happily accept a
+PR for this functionality.
In practice this is very, very rarely a problem.
+This applies to calling convention wrappers generated by the library.
+I've never seen this requirement in the wild, ever; usually for functions with this many parameters, they use standard ABI, but it's technically possible.
+When generating wrappers between different calling conventions; the library preserves entire registers,
+you can't for example specify 'please only preserve the upper 32-bits of register
If you have questions/bug reports/etc. feel free to Open an Issue.
+Happy Documenting ❤️
+ + + + + + + + +reloaded-hooks-rs is an enhanced port of the original Reloaded.Hooks (<= 4.3.0) to Rust.
This library is written as no_std
. Currently support for Windows
, Linux
and macOS
is provided out of the box. That said, a lot of functionality is platform & architecture agnostic, hopefully making porting easier.
The reloaded-hooks-rs
code is not hardwired to any platform. For other platforms you can fill the [pending] struct and provide appropriate function pointers; which would possibly make the library work even in bare metal or embedded environments.
Lists the currently available library features for different architectures.
Feature x86 & x64 ARM64 Basic Function Hooking \u2705 \u2705 Code Relocation \u2705* \u2705 Hook Stacking \u2705 \u2705 Calling Convention Wrapper Generation \u2705 \u2705 Optimal Wrapper Generation \u2705 \u2705 Length Disassembler \u2705 \u2705Bootstrapping a new architecture is not a difficult job!! Please see Architecture Support Overview for porting guidance.
__stdcall
-> __fastcall
). IP relocation is a thread safety technique employed by some libraries whereby all process' threads are stopped and any threads that are executing the prolog of the function that is being detoured at the same time have their instruction pointer overwritten to the hook.
This can only be done on some OSes that expose the relevant APIs. For the project author's use case, this is not needed, however the project would happily accept a PR for this functionality.
In practice this is very, very rarely a problem.
"},{"location":"#caller-saved-registers-always-saved-in-entirety","title":"Caller Saved Registers Always saved in Entirety","text":"This applies to calling convention wrappers generated by the library.
I've never seen this requirement in the wild, ever; usually for functions with this many parameters, they use standard ABI, but it's technically possible.
When generating wrappers between different calling conventions; the library preserves entire registers, you can't for example specify 'please only preserve the upper 32-bits of register '. As is, currently only the whole register can be preserved."},{"location":"#technical-questions","title":"Technical Questions","text":"
If you have questions/bug reports/etc. feel free to Open an Issue.
Happy Documenting \u2764\ufe0f
"},{"location":"contributing/","title":"Contribution Guidelines","text":"The wiki provides details on internals of the library. They may help you when contributing \ud83d\ude09.
First off, thank you for considering contributing to reloaded-hooks.
If your contribution is not straightforward, please first discuss the change you wish to make by creating a new issue before making the change. We might be able to discuss general design, etc. before you embark on a huge endeavour.
"},{"location":"contributing/#reporting-issues","title":"Reporting Issues","text":"Before reporting an issue on the issue tracker, please check that it has not already been reported by searching for some related keywords.
"},{"location":"contributing/#pull-requests","title":"Pull Requests","text":"Try to do one pull request per change.
"},{"location":"contributing/#commit-names","title":"Commit Names","text":"Reloaded repositories auto-generate changelogs based on commit names.
When you make git commits; try to stick to the style of Keep a changelog:
Added
for new features. Changed
for changes in existing functionality. Deprecated
for soon-to-be removed features. Removed
for now removed features. Fixed
for any bug fixes. Security
in case of vulnerabilities. Please use the standard code style cargo fmt
, and run the clippy
linter (cargo clippy
), fixing warnings before submitting PRs.
If you are using VSCode, this should be automated (on Save) per this repository's settings.
"},{"location":"Reloaded/Readme/","title":"Readme","text":"Please visit the documentation site for usage instructions & more.
"},{"location":"Reloaded/Pages/","title":"Index","text":"The Reloaded MkDocs Theme A Theme for MkDocs Material. That resembles the look of Reloaded."},{"location":"Reloaded/Pages/#about","title":"About","text":"This it the NexusMods theme for Material-MkDocs, inspired by the look of Reloaded-II.
The overall wiki theme should look fairly close to the actual launcher appearance.
"},{"location":"Reloaded/Pages/#setup-from-scratch","title":"Setup From Scratch","text":"docs/Reloaded
.mkdocs.yml
in your repository root.site_name: Reloaded MkDocs Theme\nsite_url: https://github.com/Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2\n\nrepo_name: Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2\nrepo_url: https://github.com/Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2\n\nextra:\nsocial:\n- icon: fontawesome/brands/github\nlink: https://github.com/Reloaded-Project\n- icon: fontawesome/brands/twitter\nlink: https://twitter.com/thesewer56?lang=en-GB\n\nextra_css:\n- Reloaded/Stylesheets/extra.css\n\nmarkdown_extensions:\n- admonition\n- tables\n- pymdownx.details\n- pymdownx.highlight\n- pymdownx.superfences:\ncustom_fences:\n- name: mermaid\nclass: mermaid\nformat: !!python/name:pymdownx.superfences.fence_code_format\n- pymdownx.tasklist\n- def_list\n- meta\n- md_in_html\n- attr_list\n- footnotes\n- pymdownx.tabbed:\nalternate_style: true\n- pymdownx.emoji:\nemoji_index: !!python/name:materialx.emoji.twemoji\nemoji_generator: !!python/name:materialx.emoji.to_svg\n\ntheme:\nname: material\npalette:\nscheme: reloaded-slate\nfeatures:\n- navigation.instant\n\nplugins:\n- search\n\nnav:\n- Home: index.md\n
.github/workflows/DeployMkDocs.yml
.name: DeployMkDocs\n\n# Controls when the action will run. \non:\n# Triggers the workflow on push on the master branch\npush:\nbranches: [ main ]\n\n# Allows you to run this workflow manually from the Actions tab\nworkflow_dispatch:\n\n# A workflow run is made up of one or more jobs that can run sequentially or in parallel\njobs:\n# This workflow contains a single job called \"build\"\nbuild:\n# The type of runner that the job will run on\nruns-on: ubuntu-latest\n\n# Steps represent a sequence of tasks that will be executed as part of the job\nsteps:\n\n# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it\n- name: Checkout Branch\nuses: actions/checkout@v2\nwith:\nsubmodules: recursive\n\n# Deploy MkDocs\n- name: Deploy MkDocs\n# You may pin to the exact commit or the version.\n# uses: mhausenblas/mkdocs-deploy-gh-pages@66340182cb2a1a63f8a3783e3e2146b7d151a0bb\nuses: mhausenblas/mkdocs-deploy-gh-pages@master\nenv:\nGITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\nREQUIREMENTS: ./docs/requirements.txt\n
Settings -> Pages
in your repo and select gh-pages
branch to enable GitHub pages. Your page should then be live.
Tip
Refer to Contributing for instructions on how to locally edit and modify the wiki.
Note
For Reloaded3 theme use reloaded3-slate
instead of reloaded-slate
.
Info
Most documentation pages will also include additional plugins; some which are used in the pages here. Here is a sample complete mkdocs.yml you can copy to your project for reference.
"},{"location":"Reloaded/Pages/#technical-questions","title":"Technical Questions","text":"If you have questions/bug reports/etc. feel free to Open an Issue.
Happy Documenting \u2764\ufe0f
"},{"location":"Reloaded/Pages/contributing/","title":"Contributing to the Wiki: Locally","text":"Info
This page shows you how to contribute to any documentation page or wiki based on this template.
Note
This theme is forked from my theme for Nexus Docs; and this page is synced with that.
"},{"location":"Reloaded/Pages/contributing/#tutorial","title":"Tutorial","text":"Note
If you are editing the repository with the theme itself on Windows, it might be a good idea to run git config core.symlinks true
first to allow git to create symlinks on clone.
You should learn the basics of git
, an easy way is to give GitHub Desktop (Tutorial) a go. It's only 15 minutes \ud83d\ude00.
Fork this repository:
This will create a copy of the repository on your own user account, which you will be able to edit.
Clone this repository.
For example, using GitHub Desktop:
Make changes inside the docs
folder.
Consider using a Markdown Cheat Sheet if you are new to markdown.
I recommend using a markdown editor such as Typora
. Personally I just work from inside Rider
.
Commit the changes and push to GitHub.
Open a Pull Request
.
Opening a Pull Request
will allow us to review your changes before adding them with the main official page. If everything's good, we'll hit the merge button and add your changes to the official repository.
If you are working on the wiki locally, you can generate a live preview the full website. Here's a quick guide of how you could do it from your command prompt
(cmd).
Install Python 3
If you have winget
installed, or Windows 11, you can do this from the command prompt.
winget install Python.Python.3\n
pacman -S python-pip # you should already have Python\n
Otherwise download Python 3 from the official website or package manager.
Install Material for MkDocs and Plugins (Python package)
Windows/OSXLinux# Restart your command prompt before running this command.\npip install mkdocs-material\npip install mkdocs-redirects\n
On Linux, there is a chance that python
might be a core part of your OS, meaning that you ideally shouldn't touch the system installation.
Use virtual environments instead.
python -m venv mkdocs # Create the environment\nsource ~/mkdocs/bin/activate # Enter the environment\n\npip install mkdocs-material\npip install mkdocs-redirects\n
Make sure you enter the environment before any time you run mkdocs.
Open a command prompt in the folder containing mkdocs.yml
. and run the site locally.
# Move to project folder.\ncd <Replace this with full path to folder containing `mkdocs.yml`>\nmkdocs serve\n
Copy the address to your web browser and enjoy the live preview; any changes you save will be shown instantly.
Most components of the Reloaded are governed by the GPLv3 license.
In some, albeit rare scenarios, certain libraries might be licensed under LGPLv3 instead.
This is a FAQ meant to clarify the licensing choice and its implications. Please note, though, that the full license text is the final legal authority.
"},{"location":"Reloaded/Pages/license/#why-was-gpl-v3-chosen","title":"Why was GPL v3 chosen?","text":"The primary objective is to prevent closed-source, commercial exploitation of the project.
We want to ensure that the project isn't used within a proprietary environment for profit-making purposes such as:
The Reloaded Project is a labour of love from unpaid hobbyist volunteers.
Exploiting that work for profit feels fundamentally unfair.
While the GPLv3 license doesn't prohibit commercial use outright, it does prevent commercial exploitation by requiring that contributions are given back to the open-source community.
In that fashion, everyone can benefit from the projects under the Reloaded label.
"},{"location":"Reloaded/Pages/license/#can-i-use-reloaded-libraries-commercially","title":"Can I use Reloaded Libraries Commercially?","text":"You can as long as the resulting produce is also licensed under GPLv3, and thus open source.
"},{"location":"Reloaded/Pages/license/#can-i-use-reloaded-libraries-in-a-closed-source-application","title":"Can I use Reloaded Libraries in a closed-source application?","text":"The license terms do not permit this.
However, if your software is completely non-commercial, meaning it's neither sold for profit, funded in development, nor hidden behind a paywall (like Patreon), we probably just look the other way.
This often applies to non-professional programmers, learners, or those with no intent to exploit the project. We believe in understanding and leniency for those who might not know better.
GPL v3 exists to protect the project and its contributors. If you're not exploiting the project for commercial gain, you're not hurting us; and we will not enforce the terms of the GPL.
If you are interested in obtaining a commercial license, or want an explicit written exemption, please get in touch with the repository owners.
"},{"location":"Reloaded/Pages/license/#can-i-link-reloaded-libraries-staticallydynamically","title":"Can I link Reloaded Libraries statically/dynamically?","text":"Yes, as long as you adhere to the GPLv3 license terms, you're permitted to statically link Reloaded Libraries into your project, for instance, through the use of NativeAOT or ILMerge.
"},{"location":"Reloaded/Pages/license/#guidelines-for-non-commercial-use","title":"Guidelines for Non-Commercial Use","text":"We support and encourage the non-commercial use of Reloaded Libraries. Non-commercial use generally refers to the usage of our libraries for personal projects, educational purposes, academic research, or use by non-profit organizations.
"},{"location":"Reloaded/Pages/license/#personal-projects","title":"Personal Projects","text":"You're free to use our libraries for projects that you undertake for your own learning, hobby or personal enjoyment. This includes creating mods for your favorite games or building your own applications for personal use.
"},{"location":"Reloaded/Pages/license/#educational-use","title":"Educational Use","text":"Teachers and students are welcome to use our libraries as a learning resource. You can incorporate them into your teaching materials, student projects, coding bootcamps, workshops, etc.
"},{"location":"Reloaded/Pages/license/#academic-research","title":"Academic Research","text":"Researchers may use our libraries for academic and scholarly research. We'd appreciate if you cite our work in any publications that result from research involving our libraries.
"},{"location":"Reloaded/Pages/license/#non-profit-organizations","title":"Non-profit Organizations","text":"If you're part of a registered non-profit organization, you can use our libraries in your projects. However, any derivative work that uses our libraries must also be released under the GPL.
Please remember, if your usage of our libraries evolves from non-commercial to commercial, you must ensure compliance with the terms of the GPL v3 license.
"},{"location":"Reloaded/Pages/license/#attribution-requirements","title":"Attribution Requirements","text":"As Reloaded Project is a labor of love, done purely out of passion and with an aim to contribute to the broader community, we highly appreciate your support in providing attribution when using our libraries.
While not legally mandatory under GPL v3, it is a simple act that can go a long way in recognizing the efforts of our contributors and fostering an open and collaborative atmosphere.
If you choose to provide attribution (and we hope you do!), here are some guidelines:
Acknowledge the Use of Reloaded Libraries: Mention that your project uses or is based on Reloaded libraries. This could be in your project's readme, a credits page on a website, a manual, or within the software itself.
Link to the Project: If possible, provide a link back to the Reloaded Project. This allows others to explore and potentially benefit from our work.
Remember, attribution is more than just giving credit,,, it's a way of saying thank you \ud83d\udc49\ud83d\udc48, fostering reciprocal respect, and acknowledging the power of collaborative open-source development.
We appreciate your support and look forward to seeing what amazing projects you create using Reloaded libraries!
"},{"location":"Reloaded/Pages/license/#code-from-mitbsd-licensed-projects","title":"Code from MIT/BSD Licensed Projects","text":"In some rare instances, code from more permissively licensed projects, such as those under the MIT
or BSD
licenses, may be referenced, incorporated, or slightly modified within the Reloaded Project.
It's important to us to respect the terms and intentions of these permissive licenses, which often allow their code to be used in a wide variety of contexts, including in GPL-licensed projects like ours.
In these cases, the Reloaded Project is committed to clearly disclosing the usage of such code:
Method-Level Disclosure: For individual methods or small code snippets, we use appropriate attribution methods, like programming language attributes. For example, methods borrowed or adapted from MIT-licensed projects might be marked with a [MITLicense]
attribute.
File-Level Disclosure: For larger amounts of code, such as entire files or modules, we'll include the original license text at the top of the file and clearly indicate which portions of the code originate from a differently-licensed project.
Project-Level Disclosure: If an entire library or significant portion of a project under a more permissive license is used, we will include an acknowledgment in a prominent location, such as the readme file or the project's license documentation.
This approach ensures we honor the contributions of the open source community at large, respect the original licenses, and maintain transparency with our users about where code originates from.
Any files/methods or snippets marked with those attributes may be consumed using their original license terms.
i.e. If a method is marked with [MITLicense]
, you may use it under the terms of the MIT license.
We welcome and appreciate contributions to the Reloaded Project! By contributing, you agree to share your changes under the same GPLv3 license, helping to make the project better for everyone.
"},{"location":"Reloaded/Pages/testing-zone/","title":"Testing Zone","text":"Info
This is a dummy page with various Material MkDocs controls and features scattered throughout for testing.
"},{"location":"Reloaded/Pages/testing-zone/#custom-admonitions","title":"Custom Admonitions","text":"Reloaded Admonition
An admonition featuring a Reloaded logo. My source is in Stylesheets/extra.css as Custom 'reloaded' admonition
.
Heart Admonition
An admonition featuring a heart; because we want to contribute back to the open source community. My source is in Stylesheets/extra.css as Custom 'reloaded heart' admonition
.
Nexus Admonition
An admonition featuring a Nexus logo. My source is in Stylesheets/extra.css as Custom 'nexus' admonition
.
Heart Admonition
An admonition featuring a heart; because we want to contribute back to the open source community. My source is in Stylesheets/extra.css as Custom 'nexus heart' admonition
.
Flowchart (Source: Nexus Archive Library):
flowchart TD\n subgraph Block 2\n BigFile1.bin\n end\n\n subgraph Block 1\n BigFile0.bin\n end\n\n subgraph Block 0\n ModConfig.json -.-> Updates.json \n Updates.json -.-> more[\"... more .json files\"] \n end
Sequence Diagram (Source: Reloaded3 Specification):
sequenceDiagram\n\n % Define Items\n participant Mod Loader\n participant Virtual FileSystem (VFS)\n participant CRI CPK Archive Support\n participant Persona 5 Royal Support\n participant Joker Costume\n\n % Define Actions\n Mod Loader->>Persona 5 Royal Support: Load Mod\n Persona 5 Royal Support->>Mod Loader: Request CRI CPK Archive Support API\n Mod Loader->>Persona 5 Royal Support: Receive CRI CPK Archive Support Instance\n\n Mod Loader->>Joker Costume: Load Mod\n Mod Loader-->Persona 5 Royal Support: Notification: 'Loaded Joker Costume'\n Persona 5 Royal Support->>CRI CPK Archive Support: Add Files from 'Joker Costume' to CPK Archive (via API)
State Diagram (Source: Mermaid Docs):
stateDiagram-v2\n [*] --> Still\n Still --> [*]\n\n Still --> Moving\n Moving --> Still\n Moving --> Crash\n Crash --> [*]
Class Diagram (Arbitrary)
classDiagram\n class Animal\n `NexusMobile\u2122` <|-- Car
Note
At time of writing, version of Mermaid is a bit outdated here; and other diagrams might not render correctly (even on unmodified theme); thus certain diagrams have been omitted from here.
"},{"location":"Reloaded/Pages/testing-zone/#code-block","title":"Code Block","text":"Snippet from C# version of Sewer's Virtual FileSystem (VFS):
/// <summary>\n/// Tries to get files for a specific folder, assuming the input path is already in upper case.\n/// </summary>\n/// <param name=\"folderPath\">The folder to find. Already lowercase.</param>\n/// <param name=\"value\">The returned folder instance.</param>\n/// <returns>True if found, else false.</returns>\n[MethodImpl(MethodImplOptions.AggressiveInlining)]\npublic bool TryGetFolderUpper(ReadOnlySpan<char> folderPath, out SpanOfCharDict<TTarget> value)\n{\n// Must be O(1)\nvalue = default!; // Compare equality.\n// Note to devs: Do not invert branches, we optimise for hot paths here.\nif (folderPath.StartsWith(Prefix))\n{\n// Check for subfolder in branchless way.\n// In CLR, bool is length 1, so conversion to byte should be safe.\n// Even suppose it is not; as long as code is little endian; truncating int/4 bytes to byte still results \n// in correct answer.\nvar hasSubfolder = Prefix.Length != folderPath.Length;\nvar hasSubfolderByte = Unsafe.As<bool, byte>(ref hasSubfolder);\nvar nextFolder = folderPath.SliceFast(Prefix.Length + hasSubfolderByte);\n\nreturn SubfolderToFiles.TryGetValue(nextFolder, out value!);\n}\n\nreturn false;\n}\n
Something more number heavy, Fast Inverse Square Root from Quake III Arena (unmodified).
float Q_rsqrt( float number )\n{\nlong i;\nfloat x2, y;\nconst float threehalfs = 1.5F;\n\nx2 = number * 0.5F;\ny = number;\ni = * ( long * ) &y; // evil floating point bit level hacking\ni = 0x5f3759df - ( i >> 1 ); // what the fuck? \ny = * ( float * ) &i;\ny = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration\n// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed\n\nreturn y;\n}\n
"},{"location":"Reloaded/Pages/testing-zone/#default-admonitions","title":"Default Admonitions","text":"Note
Test
Abstract
Test
Info
Test
Tip
Test
Success
Test
Question
Test
Warning
Test
Failure
Test
Danger
Test
Bug
Test
Example
Test
Quote
Test
"},{"location":"Reloaded/Pages/testing-zone/#tables","title":"Tables","text":"Method DescriptionGET
Fetch resource PUT
Update resource DELETE
Delete resource"},{"location":"Reloaded/docs/Pages/","title":"Index","text":"The Reloaded MkDocs Theme A Theme for MkDocs Material. That resembles the look of Reloaded."},{"location":"Reloaded/docs/Pages/#about","title":"About","text":"This it the NexusMods theme for Material-MkDocs, inspired by the look of Reloaded-II.
The overall wiki theme should look fairly close to the actual launcher appearance.
"},{"location":"Reloaded/docs/Pages/#setup-from-scratch","title":"Setup From Scratch","text":"docs/Reloaded
.mkdocs.yml
in your repository root.site_name: Reloaded MkDocs Theme\nsite_url: https://github.com/Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2\n\nrepo_name: Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2\nrepo_url: https://github.com/Reloaded-Project/Reloaded.MkDocsMaterial.Themes.R2\n\nextra:\nsocial:\n- icon: fontawesome/brands/github\nlink: https://github.com/Reloaded-Project\n- icon: fontawesome/brands/twitter\nlink: https://twitter.com/thesewer56?lang=en-GB\n\nextra_css:\n- Reloaded/Stylesheets/extra.css\n\nmarkdown_extensions:\n- admonition\n- tables\n- pymdownx.details\n- pymdownx.highlight\n- pymdownx.superfences:\ncustom_fences:\n- name: mermaid\nclass: mermaid\nformat: !!python/name:pymdownx.superfences.fence_code_format\n- pymdownx.tasklist\n- def_list\n- meta\n- md_in_html\n- attr_list\n- footnotes\n- pymdownx.tabbed:\nalternate_style: true\n- pymdownx.emoji:\nemoji_index: !!python/name:materialx.emoji.twemoji\nemoji_generator: !!python/name:materialx.emoji.to_svg\n\ntheme:\nname: material\npalette:\nscheme: reloaded-slate\nfeatures:\n- navigation.instant\n\nplugins:\n- search\n\nnav:\n- Home: index.md\n
.github/workflows/DeployMkDocs.yml
.name: DeployMkDocs\n\n# Controls when the action will run. \non:\n# Triggers the workflow on push on the master branch\npush:\nbranches: [ main ]\n\n# Allows you to run this workflow manually from the Actions tab\nworkflow_dispatch:\n\n# A workflow run is made up of one or more jobs that can run sequentially or in parallel\njobs:\n# This workflow contains a single job called \"build\"\nbuild:\n# The type of runner that the job will run on\nruns-on: ubuntu-latest\n\n# Steps represent a sequence of tasks that will be executed as part of the job\nsteps:\n\n# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it\n- name: Checkout Branch\nuses: actions/checkout@v2\nwith:\nsubmodules: recursive\n\n# Deploy MkDocs\n- name: Deploy MkDocs\n# You may pin to the exact commit or the version.\n# uses: mhausenblas/mkdocs-deploy-gh-pages@66340182cb2a1a63f8a3783e3e2146b7d151a0bb\nuses: mhausenblas/mkdocs-deploy-gh-pages@master\nenv:\nGITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\nREQUIREMENTS: ./docs/requirements.txt\n
Settings -> Pages
in your repo and select gh-pages
branch to enable GitHub pages. Your page should then be live.
Tip
Refer to Contributing for instructions on how to locally edit and modify the wiki.
Note
For Reloaded3 theme use reloaded3-slate
instead of reloaded-slate
.
Info
Most documentation pages will also include additional plugins; some which are used in the pages here. Here is a sample complete mkdocs.yml you can copy to your project for reference.
"},{"location":"Reloaded/docs/Pages/#technical-questions","title":"Technical Questions","text":"If you have questions/bug reports/etc. feel free to Open an Issue.
Happy Documenting \u2764\ufe0f
"},{"location":"Reloaded/docs/Pages/contributing/","title":"Contributing to the Wiki: Locally","text":"Info
This page shows you how to contribute to any documentation page or wiki based on this template.
Note
This theme is forked from my theme for Nexus Docs; and this page is synced with that.
"},{"location":"Reloaded/docs/Pages/contributing/#tutorial","title":"Tutorial","text":"Note
If you are editing the repository with the theme itself on Windows, it might be a good idea to run git config core.symlinks true
first to allow git to create symlinks on clone.
You should learn the basics of git
, an easy way is to give GitHub Desktop (Tutorial) a go. It's only 15 minutes \ud83d\ude00.
Fork this repository:
This will create a copy of the repository on your own user account, which you will be able to edit.
Clone this repository.
For example, using GitHub Desktop:
Make changes inside the docs
folder.
Consider using a Markdown Cheat Sheet if you are new to markdown.
I recommend using a markdown editor such as Typora
. Personally I just work from inside Rider
.
Commit the changes and push to GitHub.
Open a Pull Request
.
Opening a Pull Request
will allow us to review your changes before adding them with the main official page. If everything's good, we'll hit the merge button and add your changes to the official repository.
If you are working on the wiki locally, you can generate a live preview the full website. Here's a quick guide of how you could do it from your command prompt
(cmd).
Install Python 3
If you have winget
installed, or Windows 11, you can do this from the command prompt.
winget install Python.Python.3\n
pacman -S python-pip # you should already have Python\n
Otherwise download Python 3 from the official website or package manager.
Install Material for MkDocs and Plugins (Python package)
Windows/OSXLinux# Restart your command prompt before running this command.\npip install mkdocs-material\npip install mkdocs-redirects\n
On Linux, there is a chance that python
might be a core part of your OS, meaning that you ideally shouldn't touch the system installation.
Use virtual environments instead.
python -m venv mkdocs # Create the environment\nsource ~/mkdocs/bin/activate # Enter the environment\n\npip install mkdocs-material\npip install mkdocs-redirects\n
Make sure you enter the environment before any time you run mkdocs.
Open a command prompt in the folder containing mkdocs.yml
. and run the site locally.
# Move to project folder.\ncd <Replace this with full path to folder containing `mkdocs.yml`>\nmkdocs serve\n
Copy the address to your web browser and enjoy the live preview; any changes you save will be shown instantly.
Most components of the Reloaded are governed by the GPLv3 license.
In some, albeit rare scenarios, certain libraries might be licensed under LGPLv3 instead.
This is a FAQ meant to clarify the licensing choice and its implications. Please note, though, that the full license text is the final legal authority.
"},{"location":"Reloaded/docs/Pages/license/#why-was-gpl-v3-chosen","title":"Why was GPL v3 chosen?","text":"The primary objective is to prevent closed-source, commercial exploitation of the project.
We want to ensure that the project isn't used within a proprietary environment for profit-making purposes such as:
The Reloaded Project is a labour of love from unpaid hobbyist volunteers.
Exploiting that work for profit feels fundamentally unfair.
While the GPLv3 license doesn't prohibit commercial use outright, it does prevent commercial exploitation by requiring that contributions are given back to the open-source community.
In that fashion, everyone can benefit from the projects under the Reloaded label.
"},{"location":"Reloaded/docs/Pages/license/#can-i-use-reloaded-libraries-commercially","title":"Can I use Reloaded Libraries Commercially?","text":"You can as long as the resulting produce is also licensed under GPLv3, and thus open source.
"},{"location":"Reloaded/docs/Pages/license/#can-i-use-reloaded-libraries-in-a-closed-source-application","title":"Can I use Reloaded Libraries in a closed-source application?","text":"The license terms do not permit this.
However, if your software is completely non-commercial, meaning it's neither sold for profit, funded in development, nor hidden behind a paywall (like Patreon), we probably just look the other way.
This often applies to non-professional programmers, learners, or those with no intent to exploit the project. We believe in understanding and leniency for those who might not know better.
GPL v3 exists to protect the project and its contributors. If you're not exploiting the project for commercial gain, you're not hurting us; and we will not enforce the terms of the GPL.
If you are interested in obtaining a commercial license, or want an explicit written exemption, please get in touch with the repository owners.
"},{"location":"Reloaded/docs/Pages/license/#can-i-link-reloaded-libraries-staticallydynamically","title":"Can I link Reloaded Libraries statically/dynamically?","text":"Yes, as long as you adhere to the GPLv3 license terms, you're permitted to statically link Reloaded Libraries into your project, for instance, through the use of NativeAOT or ILMerge.
"},{"location":"Reloaded/docs/Pages/license/#guidelines-for-non-commercial-use","title":"Guidelines for Non-Commercial Use","text":"We support and encourage the non-commercial use of Reloaded Libraries. Non-commercial use generally refers to the usage of our libraries for personal projects, educational purposes, academic research, or use by non-profit organizations.
"},{"location":"Reloaded/docs/Pages/license/#personal-projects","title":"Personal Projects","text":"You're free to use our libraries for projects that you undertake for your own learning, hobby or personal enjoyment. This includes creating mods for your favorite games or building your own applications for personal use.
"},{"location":"Reloaded/docs/Pages/license/#educational-use","title":"Educational Use","text":"Teachers and students are welcome to use our libraries as a learning resource. You can incorporate them into your teaching materials, student projects, coding bootcamps, workshops, etc.
"},{"location":"Reloaded/docs/Pages/license/#academic-research","title":"Academic Research","text":"Researchers may use our libraries for academic and scholarly research. We'd appreciate if you cite our work in any publications that result from research involving our libraries.
"},{"location":"Reloaded/docs/Pages/license/#non-profit-organizations","title":"Non-profit Organizations","text":"If you're part of a registered non-profit organization, you can use our libraries in your projects. However, any derivative work that uses our libraries must also be released under the GPL.
Please remember, if your usage of our libraries evolves from non-commercial to commercial, you must ensure compliance with the terms of the GPL v3 license.
"},{"location":"Reloaded/docs/Pages/license/#attribution-requirements","title":"Attribution Requirements","text":"As Reloaded Project is a labor of love, done purely out of passion and with an aim to contribute to the broader community, we highly appreciate your support in providing attribution when using our libraries.
While not legally mandatory under GPL v3, it is a simple act that can go a long way in recognizing the efforts of our contributors and fostering an open and collaborative atmosphere.
If you choose to provide attribution (and we hope you do!), here are some guidelines:
Acknowledge the Use of Reloaded Libraries: Mention that your project uses or is based on Reloaded libraries. This could be in your project's readme, a credits page on a website, a manual, or within the software itself.
Link to the Project: If possible, provide a link back to the Reloaded Project. This allows others to explore and potentially benefit from our work.
Remember, attribution is more than just giving credit,,, it's a way of saying thank you \ud83d\udc49\ud83d\udc48, fostering reciprocal respect, and acknowledging the power of collaborative open-source development.
We appreciate your support and look forward to seeing what amazing projects you create using Reloaded libraries!
"},{"location":"Reloaded/docs/Pages/license/#code-from-mitbsd-licensed-projects","title":"Code from MIT/BSD Licensed Projects","text":"In some rare instances, code from more permissively licensed projects, such as those under the MIT
or BSD
licenses, may be referenced, incorporated, or slightly modified within the Reloaded Project.
It's important to us to respect the terms and intentions of these permissive licenses, which often allow their code to be used in a wide variety of contexts, including in GPL-licensed projects like ours.
In these cases, the Reloaded Project is committed to clearly disclosing the usage of such code:
Method-Level Disclosure: For individual methods or small code snippets, we use appropriate attribution methods, like programming language attributes. For example, methods borrowed or adapted from MIT-licensed projects might be marked with a [MITLicense]
attribute.
File-Level Disclosure: For larger amounts of code, such as entire files or modules, we'll include the original license text at the top of the file and clearly indicate which portions of the code originate from a differently-licensed project.
Project-Level Disclosure: If an entire library or significant portion of a project under a more permissive license is used, we will include an acknowledgment in a prominent location, such as the readme file or the project's license documentation.
This approach ensures we honor the contributions of the open source community at large, respect the original licenses, and maintain transparency with our users about where code originates from.
Any files/methods or snippets marked with those attributes may be consumed using their original license terms.
i.e. If a method is marked with [MITLicense]
, you may use it under the terms of the MIT license.
We welcome and appreciate contributions to the Reloaded Project! By contributing, you agree to share your changes under the same GPLv3 license, helping to make the project better for everyone.
"},{"location":"Reloaded/docs/Pages/testing-zone/","title":"Testing Zone","text":"Info
This is a dummy page with various Material MkDocs controls and features scattered throughout for testing.
"},{"location":"Reloaded/docs/Pages/testing-zone/#custom-admonitions","title":"Custom Admonitions","text":"Reloaded Admonition
An admonition featuring a Reloaded logo. My source is in Stylesheets/extra.css as Custom 'reloaded' admonition
.
Heart Admonition
An admonition featuring a heart; because we want to contribute back to the open source community. My source is in Stylesheets/extra.css as Custom 'reloaded heart' admonition
.
Nexus Admonition
An admonition featuring a Nexus logo. My source is in Stylesheets/extra.css as Custom 'nexus' admonition
.
Heart Admonition
An admonition featuring a heart; because we want to contribute back to the open source community. My source is in Stylesheets/extra.css as Custom 'nexus heart' admonition
.
Flowchart (Source: Nexus Archive Library):
flowchart TD\n subgraph Block 2\n BigFile1.bin\n end\n\n subgraph Block 1\n BigFile0.bin\n end\n\n subgraph Block 0\n ModConfig.json -.-> Updates.json \n Updates.json -.-> more[\"... more .json files\"] \n end
Sequence Diagram (Source: Reloaded3 Specification):
sequenceDiagram\n\n % Define Items\n participant Mod Loader\n participant Virtual FileSystem (VFS)\n participant CRI CPK Archive Support\n participant Persona 5 Royal Support\n participant Joker Costume\n\n % Define Actions\n Mod Loader->>Persona 5 Royal Support: Load Mod\n Persona 5 Royal Support->>Mod Loader: Request CRI CPK Archive Support API\n Mod Loader->>Persona 5 Royal Support: Receive CRI CPK Archive Support Instance\n\n Mod Loader->>Joker Costume: Load Mod\n Mod Loader-->Persona 5 Royal Support: Notification: 'Loaded Joker Costume'\n Persona 5 Royal Support->>CRI CPK Archive Support: Add Files from 'Joker Costume' to CPK Archive (via API)
State Diagram (Source: Mermaid Docs):
stateDiagram-v2\n [*] --> Still\n Still --> [*]\n\n Still --> Moving\n Moving --> Still\n Moving --> Crash\n Crash --> [*]
Class Diagram (Arbitrary)
classDiagram\n class Animal\n `NexusMobile\u2122` <|-- Car
Note
At time of writing, version of Mermaid is a bit outdated here; and other diagrams might not render correctly (even on unmodified theme); thus certain diagrams have been omitted from here.
"},{"location":"Reloaded/docs/Pages/testing-zone/#code-block","title":"Code Block","text":"Snippet from C# version of Sewer's Virtual FileSystem (VFS):
/// <summary>\n/// Tries to get files for a specific folder, assuming the input path is already in upper case.\n/// </summary>\n/// <param name=\"folderPath\">The folder to find. Already lowercase.</param>\n/// <param name=\"value\">The returned folder instance.</param>\n/// <returns>True if found, else false.</returns>\n[MethodImpl(MethodImplOptions.AggressiveInlining)]\npublic bool TryGetFolderUpper(ReadOnlySpan<char> folderPath, out SpanOfCharDict<TTarget> value)\n{\n// Must be O(1)\nvalue = default!; // Compare equality.\n// Note to devs: Do not invert branches, we optimise for hot paths here.\nif (folderPath.StartsWith(Prefix))\n{\n// Check for subfolder in branchless way.\n// In CLR, bool is length 1, so conversion to byte should be safe.\n// Even suppose it is not; as long as code is little endian; truncating int/4 bytes to byte still results \n// in correct answer.\nvar hasSubfolder = Prefix.Length != folderPath.Length;\nvar hasSubfolderByte = Unsafe.As<bool, byte>(ref hasSubfolder);\nvar nextFolder = folderPath.SliceFast(Prefix.Length + hasSubfolderByte);\n\nreturn SubfolderToFiles.TryGetValue(nextFolder, out value!);\n}\n\nreturn false;\n}\n
Something more number heavy, Fast Inverse Square Root from Quake III Arena (unmodified).
float Q_rsqrt( float number )\n{\nlong i;\nfloat x2, y;\nconst float threehalfs = 1.5F;\n\nx2 = number * 0.5F;\ny = number;\ni = * ( long * ) &y; // evil floating point bit level hacking\ni = 0x5f3759df - ( i >> 1 ); // what the fuck? \ny = * ( float * ) &i;\ny = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration\n// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed\n\nreturn y;\n}\n
"},{"location":"Reloaded/docs/Pages/testing-zone/#default-admonitions","title":"Default Admonitions","text":"Note
Test
Abstract
Test
Info
Test
Tip
Test
Success
Test
Question
Test
Warning
Test
Failure
Test
Danger
Test
Bug
Test
Example
Test
Quote
Test
"},{"location":"Reloaded/docs/Pages/testing-zone/#tables","title":"Tables","text":"Method DescriptionGET
Fetch resource PUT
Update resource DELETE
Delete resource"},{"location":"dev/arch/operations-impl/","title":"Operations","text":"This page tells you which Operations are currently implemented for each architecture.
This is not needed for optimal code generation on ARM64, thus was not implemented.
"},{"location":"dev/arch/operations-impl/#push","title":"Push","text":"Architecture Register Vector x64 \u2705 \u2705 x86 \u2705 \u2705 ARM64 \u2705 \u2705"},{"location":"dev/arch/operations-impl/#pushstack","title":"PushStack","text":"Architecture Supported Notes x64 \u2705 x86 \u2705 ARM64 \u2705 Will use vector registers when available."},{"location":"dev/arch/operations-impl/#pushconstant","title":"PushConstant","text":"Architecture Supported Notes x64 \u2705 x86 \u2705 ARM64 \u2705 2-5 instructions, depending on constant length."},{"location":"dev/arch/operations-impl/#stackalloc","title":"StackAlloc","text":"Architecture Supported x64 \u2705 x86 \u2705 ARM64 \u2705"},{"location":"dev/arch/operations-impl/#pop","title":"Pop","text":"Architecture to Register to Vector Notes x64 \u2705 \u2705 x86 \u2705 \u2705 ARM64 \u2705 \u2705"},{"location":"dev/arch/operations-impl/#xchg","title":"XChg","text":"Architecture Registers Vectors Notes x64 \u2705 \u2705 * *Requires scratch register x86 \u2705 \u2705 * *Requires scratch register ARM64 \u2705 * \u2705 * *Requires scratch register"},{"location":"dev/arch/operations-impl/#callabsolute","title":"CallAbsolute","text":"Architecture Supported Notes x64 (register) \u2705 Uses scratch register for efficiency. x86 (register) \u2705 Uses scratch register for efficiency. ARM64 (register) \u2705 Uses scratch register (required)"},{"location":"dev/arch/operations-impl/#callrelative","title":"CallRelative","text":"Architecture Supported Notes x64 \u2705 +-2GiB x86 \u2705 +-2GiB ARM64 \u2705 +-128MiB"},{"location":"dev/arch/operations-impl/#return","title":"Return","text":"Architecture Supported Notes x64 \u2705 x86 \u2705 ARM64 \u2705 2 instructions if offset > 0."},{"location":"dev/arch/operations-impl/#architecture-specific-operations","title":"Architecture Specific Operations","text":""},{"location":"dev/arch/operations-impl/#calliprelative","title":"CallIpRelative","text":"Architecture Supported Notes x64 \u2705 x86 \u2753 Unsupported. ARM64 (+- 1MiB) \u2705 2 instructions. ARM64 (+- 4GiB) \u2705 3 instructions."},{"location":"dev/arch/operations-impl/#jumpiprelative","title":"JumpIpRelative","text":"Architecture Supported Notes x64 \u2705 x86 \u2753 Unsupported. ARM64 (+- 1MiB) \u2705 2 instructions. ARM64 (+- 4GiB) \u2705 3 instructions."},{"location":"dev/arch/operations-impl/#optimized-pushpop-operations","title":"Optimized Push/Pop Operations","text":""},{"location":"dev/arch/operations-impl/#multipush","title":"MultiPush","text":"Architecture Supported Notes x64* \u2705 x86* \u2705 ARM64 \u2705 Might fall back to single pop/push if mixing register sizes.* Implemented but not used, due to more efficient code generation alternative.
"},{"location":"dev/arch/operations-impl/#multipop","title":"MultiPop","text":"Architecture Supported Notes x64* \u2705 x86* \u2705 ARM64 \u2705 Might fall back to single pop/push if mixing register sizes.* Implemented but not used, due to more efficient code generation alternative.
"},{"location":"dev/arch/operations/","title":"Operations","text":"This page provides a reference for all of the various 'operations' implemented by individual JIT(s).
For more information about each of the operations, see the source code \ud83d\ude09 (enum Operation<T>
).
Represents jumping to a relative offset from current instruction pointer.
Rustx64 (+- 2GiB)ARM64 (+- 128MiB)ARM64 (+- 4GiB)x86 (+- 2GiB)let jump_rel = JumpRelativeOperation {\ntarget_address: 0x200,\n};\n
jmp 0x200 ; Jump to address at current IP + 0x200\n
b 0x200 ; Branch to address at current IP + 0x200\n
adrp x9, #0 ; Load 4K page, relative to PC. (round address down to 4096)\nadd x9, x9, #100 ; Add any missing offset.\nblr x9 ; Branch to location\n
jmp 0x200 ; Jump to address at current IP + 0x200\n
"},{"location":"dev/arch/operations/#jumpabsolute","title":"JumpAbsolute","text":"Represents jumping to an absolute address stored in a register.
JIT is free to encode this as a relative branch if it's possible.
Rustx64ARM64x86let jump_abs = JumpAbsoluteOperation {\nscratch_register: rax,\ntarget_address: 0x123456,\n};\n
mov rax, 0x123456 ; Move target address into rax\njmp rax ; Jump to address in rax\n
MOVZ x9, #0x3456 ; Set lower bits.\nMOVK x9, #0x12, LSL #16 ; Move upper bits\nbr x9 ; Branch to location\n
mov eax, 0x123456 ; Move target address into eax\njmp eax ; Jump to address in eax\n
We prefer this approach to absolute jump
because it is faster performance wise.
Represents jumping to an absolute address stored in a memory address.
Rustx64 (< 2GiB)x86 (< 2GiB)ARM64 (3 instructions) Variant 0ARM64 (4-6 instructions) Variant 1let jump_ind = JumpIndirectOperation {\ntarget_address: 0x123456,\n};\n
jmp qword [0x123456] ; Jump to address stored at 0x123456\n
jmp dword [0x123456] ; Jump to address stored at 0x123456\n
; Possible on Multiple of 0x10000 with offset 0-4096\nMOVZ x9, #0x123, LSL #16 ; Store upper 16 bits.\nLDR x9, [x9, #0x456] ; Load lower 12 bit offset\nbr x9 ; Branch to location\n
; On any address up to 4GiB + 4096\nMOVZ x9, #0x3456 ; Set lower bits.\nMOVK x9, #0x12, LSL #16 ; Move upper bits\n; Continue until desired address.\nLDR x9, [x9, #0x0] ; Load from address.\nbr x9\n
On MacOS, this is not usable, because memory < 2GiB is restricted from access.
"},{"location":"dev/arch/operations/#needed-for-wrapper-generation","title":"Needed for Wrapper Generation","text":"This includes functionality like 'parameter injection'.
"},{"location":"dev/arch/operations/#mov","title":"Mov","text":"Represents a move operation between two registers.
Rustx64ARM64x86let move_op = MovOperation {\nsource: r8,\ntarget: r9, };\n
mov r9, r8 ; Move r8 into r9\n
mov x9, x8 ; Move x8 into x9\n
mov ebx, eax ; Move eax into ebx\n
"},{"location":"dev/arch/operations/#movfromstack","title":"MovFromStack","text":"Represents a move operation from the stack into a register.
Rustx64ARM64x86let move_from_stack = MovFromStackOperation {\nstack_offset: 8,\ntarget: rbx,\n};\n
mov rbx, [rsp + 8] ; Move value at rsp + 8 into rbx\n
ldr x9, [sp, #8] ; Load value at sp + 8 into x9\n
mov ebx, [esp + 8] ; Move value at esp + 8 into ebx\n
"},{"location":"dev/arch/operations/#movtostack","title":"MovToStack","text":"Represents moving a register value onto the stack at a user specified offset.
Rustx64ARM64x86let mov_to_stack = MovToStackOperation {\nregister: rbx,\nstack_offset: 16, };\n
mov [rsp + 16], rbx ; Move rbx onto the stack 16 bytes above rsp \n
str x9, [sp, #16] ; Store x9 onto the stack 16 bytes above sp\n
mov [esp + 16], ebx ; Move ebx onto the stack 16 bytes above esp\n
"},{"location":"dev/arch/operations/#push","title":"Push","text":"Represents pushing a register onto the stack.
Rustx64ARM64x86let push = PushOperation {\nregister: r9,\n};\n
push r9 ; Push rbx onto the stack\n
sub sp, sp, #8 ; Decrement stack pointer\nstr x9, [sp] ; Store x9 on the stack\n
push ebx ; Push ebx onto the stack\n
"},{"location":"dev/arch/operations/#pushstack","title":"PushStack","text":"Represents pushing a value from the stack to the stack.
Rustx64ARM64x86let push_stack = PushStackOperation {\noffset: 8,\nitem_size: 8,\n};\n
push qword [rsp + 8] ; Push value at rsp + 8 onto the stack\n
ldr x9, [sp, #8] ; Load value at sp + 8 into x9\nsub sp, sp, #8 ; Decrement stack pointer\nstr x9, [sp] ; Push x9 onto the stack\n
push [esp + 8] ; Push value at esp + 8 onto the stack\n
"},{"location":"dev/arch/operations/#pushconstant","title":"PushConstant","text":"Represents pushing a constant value onto the stack.
Rustx64ARM64x86let push_const = PushConstantOperation {\nvalue: 10,\n};\n
push 10 ; Push constant value 10 onto stack\n
sub sp, sp, #8 ; Decrement stack pointer\nmov x9, 10 ; Move constant 10 into x9\nstr x9, [sp] ; Store x9 on the stack\n
push 10 ; Push constant value 10 onto stack\n
"},{"location":"dev/arch/operations/#stackalloc","title":"StackAlloc","text":"Represents adjusting the stack pointer.
Rustx64ARM64x86let stack_alloc = StackAllocOperation {\noperand: 8,\n};\n
sub rsp, 8 ; Decrement rsp by 8\n
sub sp, sp, #8 ; Decrement sp by 8\n
sub esp, 8 ; Decrement esp by 8\n
"},{"location":"dev/arch/operations/#pop","title":"Pop","text":"Represents popping a value from the stack into a register.
Rustx64ARM64x86let pop = PopOperation {\nregister: rbx,\n};\n
pop rbx ; Pop value from stack into rbx\n
ldr x9, [sp] ; Load stack top into x9\nadd sp, sp, #8 ; Increment stack pointer\n
pop ebx ; Pop value from stack into ebx\n
"},{"location":"dev/arch/operations/#xchg","title":"XChg","text":"Represents exchanging the contents of two registers.
On some architectures (e.g. ARM64) this requires a scratch register.
Rustx64ARM64x86let xchg = XChgOperation {\nregister1: r9,\nregister2: r8,\nscratch: None,\n};\n
xchg r8, r9 ; Swap r8 and r9\n
// ARM doesn't have xchg instruction\nmov x10, x8 ; Move x8 into x10 (scratch register)\nmov x8, x9 ; Move x9 into x8\nmov x9, x10 ; Move original x8 (in x10) into x9\n
xchg eax, ebx ; Swap eax and ebx\n
"},{"location":"dev/arch/operations/#callabsolute","title":"CallAbsolute","text":"Represents calling an absolute address stored in a register or memory.
Rustx64ARM64x86let call_abs = CallAbsoluteOperation {\nscratch_register: r9,\ntarget_address: 0x123456,\n};\n
mov rax, 0x123456 ; Move target address into rax\ncall r9 ; Call address in rax\n
adr x9, target_func ; Load address of target function into x9\nblr x9 ; Branch and link to address in x9\n
mov eax, 0x123456 ; Move target address into eax\ncall eax ; Call address in eax\n
"},{"location":"dev/arch/operations/#callrelative","title":"CallRelative","text":"Represents calling a relative offset from current instruction pointer.
Rustx64ARM64x86let call_rel = CallRelativeOperation {\ntarget_address: 0x200,\n};\n
call 0x200 ; Call address at current IP + 0x200\n
bl 0x200 ; Branch with link to address at current IP + 0x200\n
call 0x200 ; Call address at current IP + 0x200\n
"},{"location":"dev/arch/operations/#return","title":"Return","text":"Represents returning from a function call.
Rustx64ARM64x86let ret = ReturnOperation {\noffset: 4,\n};\n
ret ; Return\nret 4 ; Return and add 4 to stack pointer\n
ret ; Return\nadd sp, sp, #4 ; Add 4 to stack pointer\nret ; Return\n
ret ; Return\nret 4 ; Return and add 4 to stack pointer\n
"},{"location":"dev/arch/operations/#architecture-specific-operations","title":"Architecture Specific Operations","text":"These operations are only available on certain architectures.
These are non essential, but can improve compatibility/performance.
Enabled by setting JitCapabilities::CanEncodeIPRelativeCall
and JitCapabilities::CanEncodeIPRelativeJump
in JIT.
Represents calling an IP-relative offset where target address is stored.
Rustx64ARM64 (+- 1MB)ARM64 (+- 4GB)let call_rip_rel = CallIpRelativeOperation {\ntarget_address: 0x1000,\n};\n
call qword [rip - 16] ; Address 0x1000 is at RIP-16 and contains raw address to call\n
ldr x9, 4 ; Read item in a multiple of 4 bytes relative to PC\nblr x9 ; Branch call to location\n
adrp x9, #0x0 ; Load 4K page, relative to PC. (round address down to 4096)\nldr x9, [x9, 1110] ; Read address from offset in 4K page.\nblr x9 ; Branch to location\n
"},{"location":"dev/arch/operations/#jumpiprelative","title":"JumpIpRelative","text":"Represents jumping to an IP-relative offset where target address is stored.
Rustx64ARM64 (+- 1MB)ARM64 (+- 4GB)let jump_rip_rel = JumpIpRelativeOperation {\ntarget_address: 0x1000,\n};\n
jmp qword [rip - 16] ; Address 0x1000 is at RIP-16 and contains raw address to jump\n
ldr x9, 4 ; Read item in a multiple of 4 bytes relative to PC\nbr x9 ; Branch call to location\n
adrp x9, #0x0 ; Load 4K page, relative to PC. (round address down to 4096)\nldr x9, [x9, 1110] ; Read address from offset in 4K page.\nbr x9 ; Branch call to location\n
"},{"location":"dev/arch/operations/#optimized-pushpop-operations","title":"Optimized Push/Pop Operations","text":"Enabled by setting JitCapabilities::CanMultiPush
in JIT.
Represents pushing multiple registers onto the stack.
Implementations must support push/pop of mixed registers (e.g. Reg+Vector).
Rustx64ARM64x86let multi_push = MultiPushOperation {\nregisters: [\nPushOperation { register: rbx },\nPushOperation { register: rax },\nPushOperation { register: rcx },\nPushOperation { register: rdx },\n],\n};\n
push rbx\npush rax\npush rcx\npush rdx ; Push rbx, rax, rcx, rdx onto the stack\n
sub sp, sp, #32 ; Decrement stack pointer by 32 bytes \nstp x9, x8, [sp] ; Store x9 and x8 on the stack\nstp x11, x10, [sp, #16] ; Store x11 and x10 on the stack \n
push ebx\npush eax\npush ecx\npush edx ; Push ebx, eax, ecx, edx onto the stack\n
"},{"location":"dev/arch/operations/#multipop","title":"MultiPop","text":"Represents popping multiple registers from the stack.
Implementations must support push/pop of mixed registers (e.g. Reg+Vector).
Rustx64ARM64x86let multi_pop = MultiPopOperation {\nregisters: [\nPopOperation { register: rdx },\nPopOperation { register: rcx },\nPopOperation { register: rax },\nPopOperation { register: rbx },\n],\n};\n
pop rdx\npop rcx\npop rax\npop rbx ; Pop rdx, rcx, rax, rbx from the stack\n
ldp x11, x10, [sp], #16 ; Load x11 and x10 from stack and update stack pointer\nldp x9, x8, [sp], #16 ; Load x9 and x8 from stack and update stack pointer\n
pop edx\npop ecx\npop eax\npop ebx ; Pop edx, ecx, eax, ebx from the stack\n
"},{"location":"dev/arch/overview/","title":"Architecture Overview","text":"Lists currently supported architectures and their features.
"},{"location":"dev/arch/overview/#feature-support","title":"Feature Support","text":"Lists the currently available library features for different architectures.
Feature x86 & x64 ARM64 Basic Function Hooking \u2705 \u2705 Code Relocation \u2705* \u2705 Hook Stacking \u2705 \u2705 Calling Convention Wrapper Generation \u2705 \u2705 Optimal Wrapper Generation \u2705 \u2705 Length Disassembler \u2705 \u2705The ability to hook/detour existing application functions.
"},{"location":"dev/arch/overview/#how-to-implement","title":"How to Implement","text":"Implement a code writer by inheriting the Jit<TRegister>
trait
In the writer, implement at least the following operations:
Your Platform must also support Permission Change, if it is applicable to your platform.
"},{"location":"dev/arch/overview/#length-disassembler","title":"Length Disassembler","text":"Length disassembly is the ability to determine instruction lengths at a given address.
A length disassembler determines the minimum amount of instructions (in bytes) needed to copy when hooking a function.
/// Disassembles items at `code_address` until the length of instructions\n/// is equal to or greater than `min_length`. \n/// \n/// # Returns\n/// Returns length of instructions (in bytes) greater than or equal to min_length\nfn disassemble_length(code_address: usize, min_length: usize) -> usize\n
This is done by disassembling the original instructions at code_address
, incrementing a length for each encountered instruction until length >= min_length
, then returning the result.
For hooking functions, it's necessary to inject a jmp
instruction into the existing code.
For example, given this sequence:
; x86 Assembly\nDoMathWithTwoNumbers:\ncmp rcx, 0 ; 48 83 F9 00\njg skipAdd ; 7F 0E\n\nmov rax, [rsp + 8] ; 48 8B 44 24 04\nmov rax, [rsp + 16] ; 48 8B 4C 24 04\nadd rax, rcx ; 48 01 C8\nret ; C3\n
A `5 byte`` relative jump would overwrite the first two instructions, creating:
; x86 Assembly\nDoMathWithTwoNumbers:\njmp stub ; E9 XX XX XX XX\n<INVALID INSTRUCTION> ; 0E\n\nmov rax, [rsp + 8] ; 48 8B 44 24 04\nmov rax, [rsp + 16] ; 48 8B 4C 24 04\nadd rax, rcx ; 48 01 C8\nret ; C3\n
When calling the original function again, and thus creating the Reverse Wrapper, the original instructions overwritten by the jmp
will need to be executed.
To do this, we must know that the original 2 instructions at DoMathWithTwoNumbers
were 6, NOT 5 byte
s in length total. Such that when we copy the original code to Reverse Wrapper we get
cmp rcx, 0 ; 48 83 F9 00\njg skipAdd ; 7F 0E\n
and not
cmp rcx, 0 ; 48 83 F9 00\n<INVALID INSTRUCTION> ; 7F\n
With a length disassembler, we are able to safely copy all the bytes needed.
"},{"location":"dev/arch/overview/#how-to-implement_1","title":"How to Implement","text":"Implement a length disassembler by inheriting the LengthDisassembler
trait.
Use the algorithm described in example.
"},{"location":"dev/arch/overview/#code-relocation","title":"Code Relocation","text":"Code relocation is the ability to rewrite existing code such that existing instructions using PC/IP relative operands still have valid operands post patching.
Suppose the following x86 code, which was optimised away to accept first parameter in ecx
register:
int DoMathWithTwoNumbers(int operation@ecx, int a, int b) {\n\nif (operation <= 0) {\nreturn a + b;\n}\n\n// Omitted Code Here\n}\n
In this case it's possible that there's a jump in the very beginning of the function:
DoMathWithTwoNumbers:\ncmp ecx, 0\njg skipAdd # It's greater than 0\n\nmov eax, [esp + {wordSize * 1}] ; Left Parameter\nmov ecx, [esp + {wordSize * 2}] ; Right Parameter\nadd eax, ecx\nret\n\n; Some Omitted Code Here\n\nskipAdd:\n; Omitted Code Here\n
In a scenario like this, the hooking library would overwrite the cmp
and jg
instruction when it assembles the hook entry ('enter hook'); and when the original function is called again by your hook the, 'wrapper' would now contain this jg
instruction.
Because jg
is an instruction relative to the current instruction address, the library must be able to patch and 'relocate' the function to a new address.
Basic code relocation support is needed to stack hooks.
"},{"location":"dev/arch/overview/#how-to-implement_2","title":"How to Implement","text":"Implement a relocator by CodeRewriter
trait.
There is no 'general strategy' for this, however, here are some pieces of advice:
branch
etc.) The ability to convert between different calling conventions (e.g. cdecl -> stdcall
).
To implement this, you implement a code writer by inheriting the Jit<TRegister>
trait; and implement the following operations:
If this is checked, it means the wrappers generate optimal code (to best of knowledge).
While the wrapper generator does most optimisations themselves, in some cases, it may be possible to perform additional optimisations in the JIT/Code Writer side.
For example, the reloaded-hooks
wrapper generator might generate the following sequence of pushes for ARM64:
push x0\npush x1\n
A clever ARM64 compiler however would be able to translate this to:
stp x0, x1, [sp, #-16]!\n
For some built in optimisations, like this, you can opt into these specialised instructions with JitCapabilities
on your Jit<TRegister>
.
Some others, may be implemented at Jit level instead.
"},{"location":"dev/arch/overview/#misc","title":"Misc","text":""},{"location":"dev/arch/overview/#hook-stacking","title":"Hook Stacking","text":"Hook stacking is the ability to hook a function multiple times.
This should work flawlessly out of the box if all of the required elements are implemented.
"},{"location":"dev/arch/arm64/aarch64/","title":"ARM64","text":"This is just a quick reference sheet for developers.
ARM64 is not currently implemented.
x0
-x7
Parameter/Result Registers Volatile x8
Indirect result location register Volatile x9
-x15
Local Variables Volatile x16
-x17
Intra-procedure-call scratch registers Volatile x18
Platform register, conventionally the TLS base Volatile x19
-x28
Registers saved across function calls Non-Volatile x29
Frame pointer Non-Volatile x30
Link register Volatile sp
Stack pointer Non-Volatile xzr
Zero register, always reads as zero N/A x31
Stack pointer or zero register, contextually reads as either sp
or xzr
N/A For floating point / SIMD registers:
Register ARM64 (System V) Volatile/Non-Volatilev0
-v7
Parameter/Result registers Volatile v8
-v15
Temporary registers Volatile v16
-v31
Registers saved across function calls Non-Volatile"},{"location":"dev/arch/arm64/aarch64/#calling-convention-inference","title":"Calling Convention Inference","text":"It is recommended library users manually specify conventions in their hook functions.\"
When the calling convention of <your function>
is not specified, wrapper libraries must insert the appropriate default convention in their wrappers.
aarch64-unknown-linux-gnu
: SystemVaarch64-pc-windows-msvc
: Windows ARM64Linux ARM64
: SystemVWindows ARM64
: Windows ARM64This page provides a listing of all instructions rewritten as part of the Code Relocation process.
"},{"location":"dev/arch/arm64/code_relocation/#adrp","title":"ADR(P)","text":"Purpose:
The ADR
instruction in ARM architectures computes the address of a label and writes it to the destination register.
Behaviour:
The ADR(P) instruction is rewritten as one of the following: - ADR(P) - ADR(P) + ADD - MOV (1-4 instructions)
Example:
From ADRP to ADR:
// Before: ADRP x0, 0x101000\n// After: ADR x0, 0xFFFFF\n// Parameters: (old_instruction, old_address, new_address)\nrewrite_adr(0x000800B0_u32.to_be(), 0, 4097);\n
Within 4GiB Range with Offset:
// Before: ADRP x0, 0x101000\n// After: \n// - ADRP x0, 0x102000\n// - ADD x0, x0, 1\nrewrite_adr(0x000800B0_u32.to_be(), 4097, 0);\n
Within 4GiB Range without Offset:
// Before: ADRP x0, 0x101000\n// After: ADRP x0, 0x102000\nrewrite_adr(0x000800B0_u32.to_be(), 4096, 0);\n
Out of Range:
// PC = 0x100000000\n\n// Before: ADRP, x0, 0x101000\n// After: MOV IMMEDIATE 0x100101000\nrewrite_adr(0x000800B0_u32.to_be(), 0x100000000, 0);\n
Purpose: The Bcc
instruction in ARM architectures performs a conditional branch based on specific condition flags.
Behaviour: The Branch Conditional instruction is rewritten as: - BCC - BCC + [B] - BCC + [ADRP + ADD + BR] - BCC + [MOV to Register + Branch Register]
<skip>
means, invert the condition, and jump over the code inside [] brackets.
Example:
Within 1MiB:
// Before: b.eq #4\n// After: b.eq #-4092\n// Parameters: (old_instruction, old_address, new_address, scratch_register)\nrewrite_bcc(0x20000054_u32.to_be(), 0, 4096, Some(17));\n
Within 128MiB:
// Before: b.eq #0\n// After: \n// - b.ne #8 \n// - b #-0x80000000\nrewrite_bcc(0x00000054_u32.to_be(), 0, 0x8000000 - 4, Some(17));\n
Within 4GiB Range with Address Adjustment:
// Before: b.eq #512\n// After: \n// - b.ne #16 \n// - adrp x17, #0x8000000\n// - add x17, #512\n// - br x17\nrewrite_bcc(0x00100054_u32.to_be(), 0x8000000, 0, Some(17));\n
Within 4GiB Range without Offset:
// Before: b.eq #512\n// After: \n// - b.ne #12\n// - adrp x17, #-0x8000000 \n// - br x17\nrewrite_bcc(0x00100054_u32.to_be(), 0, 0x8000000, Some(17));\n
Last Resort:
// Before: b.eq #0\n// After: \n// - b.ne #12\n// - movz x17, #0 \n// - br x17\nrewrite_bcc(0x00000054_u32.to_be(), 0, 0x100000000, Some(17));\n
Including Branch+Link (BL).
Purpose: The B
(or BL
for Branch+Link) instruction in ARM architectures performs a direct branch (or branch with link) to a specified address. When using the BL
variant, the return address (the address of the instruction following the branch) is stored in the link register LR
.
Behaviour: The Branch instruction is rewritten as one of the following: - B (or BL) - ADRP + BR - ADRP + ADD + BR - MOV + BR
Example:
Direct Branch within Range:
// Before: b #4096\n// After: b #8192\n// Parameters: (old_instruction, old_address, new_address, scratch_register, link)\nrewrite_b(0x00040014_u32.to_be(), 8192, 4096, Some(17), false);\n
Within 4GiB with Address Adjustment:
// Before: b #4096\n// After: \n// - adrp x17, #0x8000000\n// - br x17\nrewrite_b(0x00040014_u32.to_be(), 0x8000000, 0, Some(17), false);\n
Within 4GiB Range with Offset:
// Before: b #4096\n// After: \n// - adrp x17, #0x8000512\n// - add x17, x17, #512\n// - br x17\nrewrite_b(0x00040014_u32.to_be(), 0x8000512, 0, Some(17), false);\n
Out of Range, Use MOV:
// Before: b #4096\n// After: \n// - movz x17, #... \n// - ...\n// - br x17\nrewrite_b(0x00040014_u32.to_be(), 0x100000000, 0, Some(17), false);\n
Branch with Link within Range:
// Before: bl #4096\n// After: bl #8192\nrewrite_b(0x00040094_u32.to_be(), 8192, 4096, Some(17), true);\n
Purpose: The CBZ
instruction in ARM architectures performs a conditional branch when the specified register is zero. If the register is not zero and the condition is not met, the next sequential instruction is executed.
Behaviour: The CBZ
instruction is rewritten as one of the following: - CBZ - CBZ + [B] - CBZ + [ADRP + BR] - CBZ + [ADRP + ADD + BR] - CBZ + [MOV to Register + Branch Register]
Here, <skip>
is used to invert the condition and jump over the set of instructions inside the []
brackets if the condition is not met.
Example:
Within 1MiB Range:
// Before: cbz x0, #4096\n// After: cbz x0, #8192\n// Parameters: (old_instruction, old_address, new_address)\nrewrite_cbz(0x008000B4_u32.to_be(), 8192, 4096, Some(17));\n
Within 128MiB Range:
// Before: cbz x0, #4096\n// After: \n// - cbnz x0, #8\n// - b #0x8000000\nrewrite_cbz(0x008000B4_u32.to_be(), 0x8000000, 4096, Some(17));\n
Within 4GiB + 4096 aligned:
// Before: cbz x0, #4096\n// After: \n// - cbnz x0, <skip 3 instructions> \n// - adrp x17, #0x8000000\n// - br x17\nrewrite_cbz(0x008000B4_u32.to_be(), 0x8000000, 0, Some(17));\n
Within 4GiB with Offset:
// Before: cbz x0, #4096\n// After: \n// - cbnz x0, <skip 4 instructions>\n// - adrp x17, #0x8000000\n// - add x17, #512\n// - br x17\nrewrite_cbz(0x008000B4_u32.to_be(), 0x8000512, 0, Some(17));\n
Out of Range (Move and Branch):
// Before: cbz x0, #4096\n// After: \n// - cbnz x0, <skip X instructions> \n// - mov x17, <immediate address>\n// - br x17\nrewrite_cbz(0x008000B4_u32.to_be(), 0x100000000, 0, Some(17));\n
This includes Prefetch PRFM
which shares opcode with LDR.
Purpose: The LDR
instruction in ARM architectures is used to load a value from memory into a register. It can use various addressing modes, but commonly it involves an offset from a base register or the program counter.
Behaviour: The LDR
instruction is rewritten as one of the following, depending on the relocation range:
The choice of rewriting strategy is based on the distance between the old address and the new one, with a preference for the most direct form that satisfies the required address range.
If the instruction is Prefetch PRFM
, it is discarded if it can't be re-encoded as PRFM (literal)
, as prefetching with multiple instructions is probably less efficient than not prefetching at all.
Example:
Within 1MiB Range:
// Before: LDR x0, #0\n// After: LDR x0, #4096\n// Parameters: (opcode, new_imm12, rn)\nrewrite_ldr_literal(0x00000058_u32.to_be(), 4096, 0);\n
Within 4GiB + 4096 aligned:
// Before: LDR x0, #0\n// After: \n// - adrp x0, #0x100000\n// - ldr x0, [x0]\n// Parameters: (opcode, new_address, old_address)\nrewrite_ldr_literal(0x00000058_u32.to_be(), 0x100000, 0);\n
Within 4GiB:
// Before: LDR x0, #512\n// After: \n// - adrp x0, #0x100000\n// - ldr x0, [x0, #512]\n// Parameters: (opcode, new_address, old_address)\nrewrite_ldr_literal(0x00100058_u32.to_be(), 0x100000, 0);\n
Out of Range (Last Resort):
// Before: LDR x0, #512\n// After: \n// - movz x0, #0, lsl #16\n// - movk x0, #0x1, lsl #32\n// - ldr x0, [x0, #512]\n// Parameters: (opcode, new_address, old_address)\nrewrite_ldr_literal(0x00100058_u32.to_be(), 0x100000000, 0);\n
Purpose: The TBZ
instruction in ARM architectures tests a specified bit in a register and performs a conditional branch if the bit is zero. If the tested bit is not zero, the next sequential instruction is executed.
Behaviour: The TBZ
instruction is rewritten based on the distance to the new branch target. It is transformed into one of the following patterns: - TBZ - TBZ + B - TBZ + ADRP + BR - TBZ + ADRP + ADD + BR - TBZ + MOV to Register + Branch Register
Here, <skip>
is used to indicate a conditional skip over a set of instructions if the tested bit is not zero. The specific transformation depends on the offset between the current position and the new branch target.
Safety: It is crucial to ensure that the provided instruction
parameter is a valid TBZ
opcode. Incorrect opcodes or assumptions that a different type of instruction is a TBZ
may lead to undefined behaviour.
Functionality: The rewrite_tbz
function alters the TBZ
instruction to accommodate a new target address that is outside of its original range. The target address could be within the same 32KiB range or farther, necessitating different rewriting strategies.
Example:
Within 32KiB Range:
// Original: tbz x0, #0, #4096\n// Rewritten: tbz x0, #0, #8192\n// Parameters: (old_instruction, old_address, new_address, scratch_reg)\nrewrite_tbz(0x00800036_u32.to_be(), 8192, 4096, Some(17));\n
Within 128MiB Range:
// Original: tbz x0, #0, #4096\n// Rewritten:\n// - tbnz x0, #0, #8\n// - b #0x8000000\nrewrite_tbz(0x00800036_u32.to_be(), 0x8000000, 4096, Some(17));\n
Within 4GiB Range Aligned to 4096:
// Original: tbz x0, #0, #4096\n// Rewritten:\n// - tbnz w0, #0, #0xc\n// - adrp x17, #0x8001000\n// - br x17\nrewrite_tbz(0x00800036_u32.to_be(), 0x8000000, 0, Some(17));\n
Within 4GiB Range with Offset:
// Original: tbz x0, #0, #4096\n// Rewritten:\n// - tbnz w0, #0, #0x10\n// - adrp x17, #0x8001000\n// - add x17, x17, #0x512\n// - br x17\nrewrite_tbz(0x00800036_u32.to_be(), 0x8000512, 0, Some(17));\n
Out of 4GiB Range (Move and Branch):
// Original: tbz x0, #0, #4096\n// Rewritten:\n// - tbnz w0, #0, #0x14\n// - movz x17, #0x1000\n// - movk x17, #0, lsl #16\n// - movk x17, #0x1, lsl #32\n// - br x17\nrewrite_tbz(0x00800036_u32.to_be(), 0x100000000, 0, Some(17));\n
This page provides a listing of all instructions rewritten as part of the Code Relocation process for x86 architecture.
This page provides a comprehensive overview of the instruction rewriting techniques used in the code relocation process, specifically tailored for the x64 architecture.
"},{"location":"dev/arch/x86/code_relocation/#any-instruction-within-2gib-range","title":"Any Instruction within 2GiB Range","text":"If the new relative branch target is within the encodable range, it is left as relative.
"},{"location":"dev/arch/x86/code_relocation/#example-within-relative-range","title":"Example: Within Relative Range","text":"Original: (EB 02
) - jmp +2
Relocated: (E9 FF 0F 00 00
) - jmp +4098
// Parameters for test case:\n// - Original Code (Hex)\n// - Original Address\n// - New Address\n// - New Expected Code (Hex)\n`#[case::simple_branch(\"eb02\", 4096, 0, \"e9ff0f0000\")]\n
In x86, any address is reachable from any address
This is due to integer over/underflow and immediates being 2GiB in size. Therefore relocation simply involves extending the immediate as needed, i.e. jmp 0x12
to jmp 0x123012
etc.
The rest of the page will therefore leave out relative cases, and only focus on offsets greater than 2GiB.
"},{"location":"dev/arch/x86/code_relocation/#x64-rewriter-going-beyond-the-2gib-offset","title":"x64 Rewriter: Going Beyond the 2GiB Offset","text":"The x64 rewriter is only suitable for rewriting function prologues.
To be able to perform a lot of actions in a position independent manner, this rewriter uses a dummy 'scratch' register which it will overwrite.
Scratch register is determined by the following logic:
Caller Saved Registers
(these restored after function call). Because rewriting a lot of code will lead to register exhaustion, it must be reiterated the rewriter can only be used for small bits of code.
x64 has over 5000 \u203c\ufe0f instructions that require rewriting. Only a couple hundred are tested currently
"},{"location":"dev/arch/x86/code_relocation/#relative-branches","title":"Relative Branches","text":"Instructions such as JMP
, CALL
, etc.
Behaviour:
If out of range, it is rewritten using a combination of MOV
(move the absolute address into a register) followed by JMP
or CALL
to that register.
Original: (EB 02
) - jmp +2
Relocated: (48 B8 04 00 00 80 00 00 00 00 FF E0
) - mov rax, 0x80000004
- jmp rax
// Parameters for test case:\n// - Original Code (Hex)\n// - Original Address\n// - New Address\n// - New Expected Code (Hex)\n#[case::to_abs_jmp_i8(\"eb02\", 0x80000000, 0, \"48b80400008000000000ffe0\")]\n
"},{"location":"dev/arch/x86/code_relocation/#jump-conditional","title":"Jump Conditional","text":"Instructions such as jne
, jg
etc.
Behaviour:
MOV
to set the address and a JMP
to that address.Example:
Original: (70 02
) - jo +2
Relocated: (71 0C 48 B8 04 00 00 80 00 00 00 FF E0
): - jno +12 <skip>
- mov rax, 0x80000004
- jmp rax
// Parameters for test case:\n// - Original Code (Hex)\n// - Original Address\n// - New Address\n// - New Expected Code (Hex)\n#[case::jo(\"7002\", 0x80000000, 0, \"710c48b80400008000000000ffe0\")]\n
"},{"location":"dev/arch/x86/code_relocation/#loop-instructions","title":"Loop Instructions","text":"Instructions such as LOOP
, LOOPE
, and LOOPNE
.
Behaviour:
Handled by either:
ECX
and using a conditional jump based on the zero flag. (i.e. extend 'loop' address to 32-bit) or
loop
function in the opposite direction. The strategy used depends on the original instruction.
"},{"location":"dev/arch/x86/code_relocation/#example-branch-in-opposite-direction","title":"Example: Branch in Opposite Direction","text":"Original: (E2 FA
) - loop -3
Relocated: (50 E2 02 EB 0C 48 B8 FD 0F 00 80 00 00 00 00 FF E0
) - push rax
- loop +2
- jmp 0x11
- movabs rax, 0x80000ffd
- jmp rax
// Parameters for test case:\n// - Original Code (Hex)\n// - Original Address\n// - New Address\n// - New Expected Code (Hex)\n#[case::loop_backward_abs(\"50e2fa\", 0x80001000, 0, \"50e202eb0c48b8fd0f008000000000ffe0\")]\n
"},{"location":"dev/arch/x86/code_relocation/#jcx-instructions","title":"JCX Instructions","text":"Instructions such as JCXZ
, JECXZ
, JRCXZ
.
Behaviour:
IMM32
encoding. TEST
instruction followed by a conditional jump. Original: (E3 FA
) - jrcxz -3
Relocated: (E3 02 EB 0C 48 B8 FD 0F 00 80 00 00 00 00 FF E0
) - jrcxz +5
- jmp 0x11
- mov rax, 0x80000ffd
- jmp rax
// Parameters for test case:\n// - Original Code (Hex)\n// - Original Address\n// - New Address\n// - New Expected Code (Hex)\n#[case::jrcxz_abs(\"e3fa\", 0x80001000, 0, \"e302eb0c48b8fd0f008000000000ffe0\")]\n
"},{"location":"dev/arch/x86/code_relocation/#rip-relative-operand","title":"RIP Relative Operand","text":"At time of writing, this covers around 2800 \u203c\ufe0f instructions
Only around a 100 are covered by unit tests though.
Covers all instructions which have an IP relative operand, i.e. read/write to a memory address which is relative to the address of the next instruction.
Behaviour:
Replace RIP relative operand with a scratch register with the originally intended memory address.
"},{"location":"dev/arch/x86/code_relocation/#example_3","title":"Example","text":"Original: (48 8B 1D 08 00 00 00
) - mov rbx, [rip + 8]
Relocated: (48 B8 0F 00 00 00 01 00 00 00 48 8B 18
) - mov rax, 0x10000000f
- mov rbx, [rax]
// Parameters for test case:\n// - Original Code (Hex)\n// - Original Address\n// - New Address\n// - New Expected Code (Hex)\n#[case::mov_rhs(\"488b1d08000000\", 0x100000000, 0, \"48b80f00000001000000488b18\")]\n
"},{"location":"dev/arch/x86/code_relocation/#how-this-is-done","title":"How this is Done","text":"reloaded-hooks-rs
uses the iced library under the hood for assembly and disassembly.
In iced, operands can be broken down to 3 main types:
Name Note register Including Vector Registers memory i.e.[rax]
or [rip + 4]
imm Immediate, 8/16/32/64 Immediates use multiple types, e.g. Immediate8
, Immediate16
etc. but on assembler side you can pass them all as Immediate32, so you can group them.
Each instruction can have 0-5 operands, where there is at max 1 operand which can be RIP relative.
To handle this, a script projects/code-generators/x86/generate_enum_ins_combos.py
was used to dump all possible operand permutations from Iced
source. Then I wrote functions to handle each possible permutation.
1 Operand:
2 Operands:
3 Operands:
4 Operands:
5 Operands:
If reloaded-hooks-rs
encounters an instruction with RIP relative operand that uses any of the following operand permutations, it should successfully patch it.
This is just a quick reference sheet for developers.
eax
Caller-saved, return value Caller-saved, return value ebx
Callee-saved Callee-saved ecx
Caller-saved Caller-saved edx
Caller-saved Caller-saved esi
Callee-saved Callee-saved edi
Callee-saved Callee-saved ebp
Callee-saved Callee-saved esp
Callee-saved Callee-saved For floating point registers:
Register stdcall (Microsoft x86) cdeclst(0)
-st(7)
Caller-saved, st(0)
used for returning floating point values. Caller-saved, st(0)
used for returning floating point values. mm0
-mm7
Caller-saved Caller-saved xmm0
-xmm7
Caller-saved Caller-saved Both calling conventions pass function parameters on the stack, in right-to-left order, and they both return values in eax
. For floating-point values or larger structures, the FPU stack or additional conventions are used. The main difference for function calls is that stdcall expects the function (callee) to clean up the stack, while cdecl expects the caller to do it.
It is recommended library users manually specify conventions in their hook functions.\"
When the calling convention of <your function>
is not specified, wrapper libraries must insert the appropriate default convention in their wrappers.
i686-pc-windows-gnu
: cdecli686-pc-windows-msvc
: cdecli686-unknown-linux-gnu
: SystemVLinux x86
: SystemVWindows x86
: cdeclThis is just a quick reference sheet for developers.
The order of the registers is typically as follows for Microsoft x64 ABI: rcx
, rdx
, r8
, r9
, then the rest of the parameters are pushed onto the stack in reverse order (right-to-left).
For the System V ABI on x64: rdi
, rsi
, rdx
, rcx
, r8
, r9
, then the rest of the parameters are pushed onto the stack in reverse order (right-to-left).
rax
Caller-saved Caller-saved rbx
Callee-saved Callee-saved rcx
Caller-saved, 1st parameter Caller-saved, 4th parameter rdx
Caller-saved, 2nd parameter Caller-saved, 3rd parameter rsi
Caller-saved Caller-saved, 2nd parameter rdi
Caller-saved Caller-saved, 1st parameter rbp
Callee-saved Callee-saved rsp
Callee-saved Callee-saved r8
Caller-saved, 3rd parameter Caller-saved, 5th parameter r9
Caller-saved, 4th parameter Caller-saved, 6th parameter r10
Caller-saved Caller-saved r11
Caller-saved Caller-saved r12
Callee-saved Callee-saved r13
Callee-saved Callee-saved r14
Callee-saved Callee-saved r15
Callee-saved Callee-saved Floating Point Registers (Microsoft)
Register Microsoft x64 ABIst(0)
-st(7)
Caller-saved mm0
-mm7
Caller-saved xmm0
-xmm5
Caller-saved, used for floating point parameters. ymm0
-zmm5
Caller-saved, used for floating point parameters. zmm0
-zmm5
Caller-saved, used for floating point parameters. xmm6
-xmm15
Callee-saved. ymm6
-ymm15
Callee-saved. Upper half must be preserved by the caller zmm6
-zmm31
Callee-saved. Upper half must be preserved by the caller Floating Point Registers (SystemV)
Register SystemV ABIst(0)
-st(7)
Caller-saved mm0
-mm7
Caller-saved xmm0
-xmm7
Caller-saved, used for floating point parameters ymm0
-zmm7
Caller-saved, used for floating point parameters zmm0
-zmm7
Caller-saved, used for floating point parameters xmm8
-xmm15
Caller-saved ymm8
-ymm15
Caller-saved, used for floating point parameters zmm8
-zmm31
Caller-saved, used for floating point parameters On Linux, syscalls use R10 instead of RCX in SystemV ABI
"},{"location":"dev/arch/x86/x86_64/#intel-apx","title":"Intel APX","text":"Information sourced from Source.
Future Intel processors are expected to ship with APX, extending the registers to 32 by adding R16-R31.
These future registers are expected to be caller saved.
To quote document:
Defining all new state (Intel\u00ae APX\u2019s EGPRs) as volatile (caller-saved or scratch)
"},{"location":"dev/arch/x86/x86_64/#calling-convention-inference","title":"Calling Convention Inference","text":"It is recommended library users manually specify conventions in their hook functions.\"
When the calling convention of <your function>
is not specified, wrapper libraries must insert the appropriate default convention in their wrappers.
x86_64-pc-windows-gnu
: Microsoftx86_64-pc-windows-msvc
: Microsoftx86_64-unknown-linux-gnu
: SystemVx86_64-apple-darwin
: SystemVWindows x64
: MicrosoftLinux x64
: SystemVmacOS x64
: SystemVDesign notes common to all hooking strategies.
"},{"location":"dev/design/common/#wrappers","title":"Wrappers","text":""},{"location":"dev/design/common/#wrapper","title":"Wrapper","text":"Wrappers are stubs which convert from the calling convention of the original function to your calling convention.
If the calling convention of the hooked function and your function matches, this wrapper is simply just 1 jmp
instruction.
Wrappers are documented in their own page here.
"},{"location":"dev/design/common/#reversewrapper","title":"ReverseWrapper","text":"Stub which converts from your code's calling convention to original function's calling convention
This is basically Wrapper with source
and destination
swapped around
Hooks in reloaded-hooks-rs
are structured in a very specific way to ensure thread safety.
They sacrifice a bit of memory usage in favour of performance + thread safety.
Most hooks, regardless of type have a memory layout that looks something like this:
// Size: 2 registers\npub struct Hook\n{\n/// The address of the stub containing bridging code\n/// between your code and custom code. This is the address\n/// of the code that will actually be executed at runtime.\nstub_address: usize,\n\n/// Address of the 'properties' structure, containing\n/// the necessary info to manipulate the data at stub_address\nprops: NonNull<StubPackedProps>,\n}\n
Notably, there are two heap allocations. One at stub_address
, which contains the executable code, and one at props
, which contains packed info of the stub at stub_address
.
The hooks use a 'swapping' system. Both stub_address
and props
contains swap space
. When you enable or disable a hook, the data in the two 'swap spaces' are swapped around.
In other words, when stub_address
' 'swap space' contains the code for HookFunction
(hook enabled), the 'swap space' at props
' contains the code for Original Code
.
Thread safety is ensured by making writes within the stub itself atomic, as well as making the emplacing of the jump to the stub in the original application code atomic.
"},{"location":"dev/design/common/#stub-layout","title":"Stub Layout","text":"The memory region containing the actual executed code.
The stub has two possible layouts, if the Swap Space
is small enough such that it can be atomically overwritten, it will look like this:
- 'Swap Space' [HookCode / OriginalCode]\n<pad to atomic register size>\n
Otherwise, if Swap Space
cannot be atomically overwritten, it will look like:
- 'Swap Space' [HookCode / OriginalCode]\n- HookCode\n- OriginalCode\n
Some hooks may store, extra data after OriginalCode
.
For example, if calling convention conversion is needed, the HookCode
becomes a ReverseWrapper, and the stub will also contain a Wrapper.
If calling convention conversion is needed, the layout looks like this:
- 'Swap Space' [ReverseWrapper / OriginalCode]\n- ReverseWrapper\n- OriginalCode\n- Wrapper\n
"},{"location":"dev/design/common/#example","title":"Example","text":"Using ARM64 Assembly Hook as an example.
If the 'OriginalCode' was:
mov x0, x1\nadd x0, x2\n
And the 'HookCode' was:
add x1, x1\nmov x0, x2\n
The memory would look like this when hook is enabled.
swap: ; Currently Applied (Hook)\nmov x0, x1\nadd x0, x2\nb back_to_code\n\nhook: ; HookCode\nadd x1, x1\nmov x0, x2\nb back_to_code\n\noriginal: ; OriginalCode\nmov x0, x1\nadd x0, x2\nb back_to_code\n
(When sizeof(swap)
is larger than biggest possible atomic write.)
Each Assembly Hook contains a pointer to the heap stub (seen above) and a pointer to the heap.
The heap contains all information required to perform operations on the stub.
- StubPackedProps\n - Enabled Flag\n - IsSwapOnly\n - SwapSize\n - HookSize\n- [Hook Function / Original Code]\n
The data in the heap contains a short `StubPackedProps`` struct, detailing the data stored over in the stub.
The SwapSize
contains the length of the 'swap' info (and also consequently, offset of HookCode
). The HookSize
contains the length of the 'hook' instructions (and consequently, offset of OriginalCode
).
If the IsSwapOnly
flag is set, then this data is to be atomically overwritten.
When transitioning between Enabled/Disabled state, we place a temporary branch at entry
, this allows us to manipulate the remaining code safely.
Using ARM64 Assembly Hook as an example.
We start the 'disable' process with a temporary branch:
entry: ; Currently Applied (Hook)\nb original ; Temp branch to original\nmov x0, x2\nb back_to_code\n\nhook: ; Backup (Hook)\nadd x1, x1\nmov x0, x2\nb back_to_code\n\noriginal: ; Backup (Original)\nmov x0, x1\nadd x0, x2\nb back_to_code\n
Don't forget to clear instruction cache on non-x86 architectures which need it.
This ensures we can safely overwrite the remaining code...
Then we overwrite entry
code with hook
code, except the branch:
entry: ; Currently Applied (Hook)\nb original ; Branch to original\nadd x0, x2 ; overwritten with 'original' code.\nb back_to_code ; overwritten with 'original' code.\n\nhook: ; Backup (Hook)\nadd x1, x1\nmov x0, x2\nb back_to_code\n\noriginal: ; Backup (Original)\nmov x0, x1\nadd x0, x2\nb back_to_code\n
And lastly, overwrite the branch.
To do this, read the original sizeof(nint)
bytes at entry
, replace branch bytes with original bytes and do an atomic write. This way, the remaining instruction is safely replaced.
entry: ; Currently Applied (Hook)\nadd x1, x1 ; 'original' code.\nadd x0, x2 ; 'original' code.\nb back_to_code ; 'original' code.\n\noriginal: ; Backup (Original)\nmov x0, x1\nadd x0, x2\nb back_to_code\n\nhook: ; Backup (Hook)\nadd x1, x1\nmov x0, x2\nb back_to_code\n
This way we achieve zero overhead CPU-wise, at expense of some memory.
"},{"location":"dev/design/common/#limits","title":"Limits","text":"Stub info is packed by default to save on memory space. By default, the following limits apply:
Property 4 Byte Instruction (e.g. ARM64) Other (e.g. x86) Max Orig Code Length 128KiB 32KiB Max Hook Code Length 128KiB 32KiBThese limits may increase in the future if additional required functionality warrants extending metadata length.
"},{"location":"dev/design/common/#thread-safety-on-x86","title":"Thread Safety on x86","text":"Thread safety is 'theoretically' not guaranteed for every possible x86 processor, however is satisfied for all modern CPUs.
The information below is x86 specific but applies to all architectures with a non-fixed instruction size. Architectures with fixed instruction sizes (e.g. ARM) are thread safe in this library by default.
"},{"location":"dev/design/common/#the-theory","title":"The Theory","text":"If the jmp
instruction emplaced when switching state overwrites what originally were multiple instructions, it is theoretically possible that the placing the jmp
will make the instruction about to be executed invalid.
For example if the previous instruction sequence was:
0x0: push ebp\n0x1: mov ebp, esp ; 2 bytes\n
And inserting a jmp produces:
0x0: jmp disabled ; 2 bytes\n
It's possible that the CPU's Instruction Pointer was at 0x1`` at the time of the overwrite, making the
mov ebp, esp` instruction invalid.
In practice, modern x86 CPUs (1990 onwards) from Intel, AMD and VIA prefetch instruction in batches of 16 bytes. We place our stubs generated by the various hooks on 16-byte boundaries for this (and optimisation) reasons.
So, by the time we change the code, the CPU has already prefetched the instructions we are atomically overwriting.
In other words, it is simply not possible to perfectly time a write such that a thread at 0x1
(mov ebp, esp
) would read an invalid instruction, as that instruction was prefetched and is being executed from local thread cache.
Here is a thread safety table for x86, taking the above into account:
Safe? Hook Notes \u2705 Function Functions start on multiples of 16 on pretty much all compilers, per Intel Optimisation Guide. \u2705 Branch Stubs are 16 aligned. \u2705 Assembly Stubs are 16 aligned. \u2705 VTable VTable entries areusize
aligned, and don't cross cache boundaries."},{"location":"dev/design/common/#hook-length-mismatch-problem","title":"Hook Length Mismatch Problem","text":"When a hook is already present, and you wish to stack that hook over the existing hook, certain problems might arise.
"},{"location":"dev/design/common/#when-your-hook-is-shorter-than-original","title":"When your hook is shorter than original.","text":"This is notably an issue when a hook entry composes of more than 1 instruction; i.e. on RISC architectures.
There is a potential register allocation caveat in this scenario.
Pretend you have the following ARM64 function:
ARM64CADD x1, #5\nADD x2, #10\nADD x0, x1, x2\nADD x0, x0, x0\nRET\n
x1 = x1 + 5;\nx2 = x2 + 10;\nint x0 = x1 + x2;\nx0 = x0 + x0;\nreturn x0;\n
And then, a large hook using an absolute jump with register is applied:
# Original instructions here replaced\nMOVZ x0, A\nMOVK x0, B, LSL #16\nMOVK x0, C, LSL #32\nMOVK x0, D, LSL #48\nB x0\n# <= branch returns here\n
If you then try to apply a smaller hook after applying the large hook, you might run into the following situation:
# The 3 instructions here are an absolute jump using pointer.\nadrp x9, [0] ldr x9, [x9, 0x200] br x9\n# Call to original function returns here, back to then branch to previous hook\nMOVK x0, D, LSL #48\nB x0\n
This is problematic, with respect to register allocation. Absolute jumps on some RISC platforms like ARM will always require the use of a scratch register.
But there is a risk the scratch register used is the same register (x0
) as the register used by the previous hook as the scratch register. In which case, the jump target becomes invalid.
mov
+ branch
combinations for each target architecture.Only applies to architectures with variable length instructions. (x86)
Some hooking libraries don't clean up remaining stolen bytes after installing a hook.
Very notably Steam does this for rendering (overlay) and input (controller support).
Consider the original function having the following instructions:
48 8B C4 mov rax, rsp\n48 89 58 08 mov [rax + 08], rbx\n
After Steam hooks, it will leave the function like this
E9 XX XX XX XX jmp 'somewhere'\n58 08 <invalid instruction. leftover from state before>\n
If you're not able to install a relative hook, e.g. need to use an absolute jump
FF 25 XX XX XX XX jmp ['addr']\n
The invalid instructions will now become part of the 'stolen' bytes, when you call the original; and invalid instructions may be executed.
"},{"location":"dev/design/common/#resolution-strategy_1","title":"Resolution Strategy","text":"This library must do the following:
relative jump
over absolute jump
) when possible. There unfortunately isn't much we can do to detect invalid instructions generated by other hooking libraries reliably, best we can do is try to avoid it by using shorter hooks. Thankfully this is not a common issue given most people use the 'popular' libraries.
"},{"location":"dev/design/common/#fallback-strategies","title":"Fallback Strategies","text":""},{"location":"dev/design/common/#return-address-patching","title":"Return Address Patching","text":"This feature will not be ported over from legacy Reloaded.Hooks
, until an edge case is found that requires this.
This section explains how Reloaded handles an edge case within an already super rare case.
This topic is a bit more complex, so we will use x86 as example here.
For any of this to be necessary, the following conditions must be true:
The low probability of this happening, at least on Windows and/or Linux is rather insane. It cannot be estimated, but if I were to have a guess, maybe 1 in 1 billion. You'd be more likely to die from a shark attack.
In any case, when this happens, Reloaded performs return address patching.
Suppose a foreign hooking library hooks a function with the following prologue:
55 push ebp\n89 e5 mov ebp, esp\n00 00 add [eax], al\n83 ec 20 sub esp, 32 ...\n
After hooking, this code would look like:
E9 XX XX XX XX jmp 'somewhere'\n<= existing hook jumps back here when calling original (this) function\n83 ec 20 sub esp, 32 ...\n
When the prologue is set up 'just right', such that the existing instrucions divide perfectly into 5 bytes, and we need to insert a 6 byte absolute jmp FF 25
, Reloaded must patch the return address.
Reloaded has a built in patcher for this super rare scenario, which detects and attempts to patch return addresses of the following patterns:
Where nop* represents 0 or more nops.\n\n1. Relative immediate jumps. \n\n nop*\n jmp 0x123456\n nop*\n\n2. Push + Return\n\n nop*\n push 0x612403\n ret\n nop*\n\n3. RIP Relative Addressing (X64)\n\n nop*\n JMP [RIP+0]\n nop*\n
This patching mechanism is rather complicated, relies on disassembling code at runtime and thus won't be explained here.
Different hooking libraries use different logic for storing callbacks. In some cases alignment of code (or rather lack thereof) can also make this operation unreliable, since we rely on disassembling the code at runtime to find jumps back to end of hook. The success rate of this operation is NOT 100%
"},{"location":"dev/design/common/#requirements-for-external-libraries-to-interoperate","title":"Requirements for External Libraries to Interoperate","text":"While I haven't studied the source code of other hooking libraries before, I've had no issues in the past with the common Detours and minhook libraries that are commonly used
"},{"location":"dev/design/common/#hooking-over-reloaded-hooks","title":"Hooking Over Reloaded Hooks","text":"Libraries which can safely interoperate (stack hooks ontop) of Reloaded Hooks Hooks' must satisfy the following.
Must be able to patch (re-adjust) relative jumps.
Must be able to automatically determine number of bytes to steal from original function.
See: Code Relocation
"},{"location":"dev/design/wrappers/","title":"Calling Conversion Wrappers","text":"Describes how stubs for converting between different Calling Conventions (ABIs) are generated.
This page uses x86 as an example, however the same concepts apply to other architectures.
These stubs are what allows Reloaded.Hooks-rs
to hook functions which take parameters in custom registers, allowing developers to skip writing error prone 'naked'
functions by hand.
Setting frame pointer (ebp
) is not necessary, as our wrapper shouldn't use it
# push LR if present on platform\npush ebp\npush ebx\npush edi\npush esi\n
Setup Function Parameters
# In a loop\npush dword [ebp + {baseStackOffset}]\n
Reserve Extra Stack Space
Some calling conventions require extra space reserved up front
sub esp, {whatever}\n
If target function returns in different register than caller expects, might need to for example mov eax, ecx
.
mov eax, ecx\n
# Restore non-volatile registers\npop esi\npop edi\npop ebx\npop ebp\n# pop LR if relevant on given platform\n
The general implementation for 64-bit is the same, however the stack must be 16 byte aligned at method entry, and for MSFT convention, 32 bytes reserved on stack before call
There are also some very minor nuances, which the actual code has to handle, but this is the general jist of it.
"},{"location":"dev/design/wrappers/#optimization","title":"Optimization","text":""},{"location":"dev/design/wrappers/#align-wrappers-to-architecture-recommended-alignment","title":"Align Wrappers to Architecture Recommended Alignment","text":"This optimizes CPU instruction fetch, which (on x86) operates on 16 byte boundaries.
So we align our wrappers to these boundaries.
"},{"location":"dev/design/wrappers/#eliminate-callee-saved-registers","title":"Eliminate Callee Saved Registers","text":"When there are overlaps in callee saved registers between source and target, we can skip backing up those registers.
For example, cdecl
and stdcall
use the same callee saved registers, ebp
, ebx
, esi
, edi
. When converting between these two conventions, it is not necessary to backup/restore any of them in the wrapper, because the target function will already take care of that.
Example: cdecl target -> stdcall
wrapper.
# Stack Backup\npush ebp\nmov ebp, esp\n\n# Callee Save\npush ebx\npush edi\npush esi\n\n# Re push parameters\npush dword [ebp + {x}]\npush dword [ebp + {x}]\n\ncall {function}\nadd esp, 8\n\n# Callee Restore\npop esi\npop edi\npop ebx\n\n# Stack Restore\npop ebp\nret 8\n
# Stack Backup\npush ebp\nmov ebp, esp\n\n# Re push parameters\npush dword [ebp + {x}]\npush dword [ebp + {x}]\n\ncall {function}\nadd esp, 8\n\n# Stack Restore\npop ebp\nret 8\n
Pseudocode example. Not verified for accuracy, but it shows the idea
In the cdecl -> stdcall
example, ebp
is considered a callee saved register too, thus it should be possible to optimise into:
# Re push parameters\npush dword [esp + {x}]\npush dword [esp + {x}]\n\ncall {function}\nadd esp, 8\n\nret 8\n
"},{"location":"dev/design/wrappers/#combine-pushpop-operations-when-possible","title":"Combine Push/Pop Operations when Possible","text":"When pushing multiple registers at once, it is possible to remove redundant stack operations.
Imagine a situation where you need to push 3 float registers onto the stack; if we pass the instructions from the wrapper generator verbatim [push a, then push b, then push c], we would land with the following:
; Push XMM registers\nsub rsp, 16\nmovdqu [rsp], xmm0\nsub rsp, 16\nmovdqu [rsp], xmm1\nsub rsp, 16\nmovdqu [rsp], xmm2\n\n; Pop XMM registers\nmovdqu xmm2, [rsp]\nadd rsp, 16\nmovdqu xmm1, [rsp]\nadd rsp, 16\nmovdqu xmm0, [rsp]\nadd rsp, 16\n
This is unoptimal as it can be simplified to:
# Push Registers to the Stack\nsub rsp, 48\nmovdqu [rsp], xmm0 movdqu [rsp + 16], xmm1\nmovdqu [rsp + 32], xmm2\n\n# Pop three XMM registers from the Stack\nmovdqu xmm0, [rsp]\nmovdqu xmm1, [rsp + 16]\nmovdqu xmm2, [rsp + 32]\nadd rsp, 48\n
When generating wrappers, the generator must recognise this pattern, and merge multiple push/pop operations into a single block, wherever possible.
It is optimal to access memory sequentially from lowest to highest address.
"},{"location":"dev/design/wrappers/#move-between-registers-instead-of-push-pop","title":"Move Between Registers Instead of Push Pop","text":"In some cases it's possible to mov between registers, rather than doing an explicit push+pop operation
Suppose you have a custom target -> stdcall
wrapper. Custom is defined as int@eax FastAdd(int a@eax, int b@ecx)
.
Normally wrapper generation will convert the arguments like this:
# Re-push STDCALL arguments to stack\npush dword [esp + {x}]\npush dword [esp + {x}]\n\n# Pop into correct registers\npop eax\npop ecx\n\n# Call that function\n# ...\n
There's opportunities for optimisation here; notably you can do:
# Pop into correct registers\nmov eax, [esp + {x}]\nmov ecx, [esp + {x}]\n\n# Call that function\n# ...\n
Optimising cases where the source/from convention, e.g. custom target -> stdcall
has no register parameters is trivial, since you can directly mov into the intended target register. And this is the most common use case in x86.
For completeness, it should be noted that in the opposite direction stdcall target -> custom
, such as one that would be used in entry point of a hook (ReverseWrapper), no optimisation is needed here, as all registers are directly pushed without any extra steps.
In the backend, the wrapper generator keeps track of current stack pointer (assuming start is '0'); and uses that information to match the push and pop operations accordingly \ud83d\ude09
"},{"location":"dev/design/wrappers/#with-register-to-register","title":"With Register to Register","text":"In x64, and more advanced x86 scenarios where both to/from calling convention have register parameters, mov optimisation is not trivial.
"},{"location":"dev/design/wrappers/#basic-case","title":"Basic Case","text":"Suppose you have a a function to add 'health' to a character that's in a struct or class. i.e. int AddHealth(Player* this, int amount)
. (Note: The 'this' parameter to struct instance is implicit and added during compilation.)
class Player {\nint mana;\nint health;\n\nvoid AddHealth(int amount) {\nhealth += amount;\n}\n};\n
add dword [rdi+4], esi\nret\n
See for yourself.
add dword [rcx+4], edx\nret\n
See for yourself.
If you were to make a SystemV target -> Microsoft
wrapper; you would have to move the two registers from rcx, rdx
to rdi, rsi
.
Therefore, a wrapper might have code that looks something like:
# Push register parameters of the function being returned (right to left, reverse loop)\npush rdx\npush rcx\n\n# Pop parameters into registers of function being called\npop rdi\npop rsi\n
In this case, it is possible to optimise with:
mov rdi, rcx # last push, first pop\nmov rsi, rdx # second last push, second pop\n
Provided that the wrapper correctly saves and restores callee moved registers for returned method, i.e. backs up RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15
, this is fine.
Or in the case of this wrapper, just RDI, RSI
(due to overlap within the 2 conventions).
The 'strategy' to generate code for this optimisation is keeping track of stack, start between push
and pop
in the ASM and pair the registers in the corresponding push
and pop
operations together, going outwards until there is no push/pop left.
This is just another example.
Suppose we add 2 more parameters...
C++x64 asm (SystemV)x64 asm (Microsoft)class Player {\nint mana;\nint health;\nint money;\n\nvoid AddStats(int health, int mana, int money) {\nthis->health += health;\nthis->mana += mana;\nthis->money += money;\n}\n};\n
add dword [rdi+4], esi # health\nadd dword [rdi], edx # mana\nadd dword [rdi+8], ecx # money\nret\n
See for yourself.
add dword [rcx+4], edx # health\nadd dword [rcx], r8d # mana\nadd dword [rcx+8], r9d # money\nret\n
See for yourself.
There is now an overlap between the registers used.
Microsoft convention uses: - rcx
for self - rdx
for health
SystemV uses: - rcx
for money - rdx
for mana
The wrapper now does the following
UnoptimisedOptimised (Contains Bug)# Push register parameters of the function being returned (right to left, reverse loop)\npush rcx\npush rdx\npush rsi\npush rdi\n\n# Pop parameters into registers of function being called\npop rcx\npop rdx\npop r8\npop r9\n
mov rcx, rdi\nmov rdx, rsi\nmov r8, rdx\nmov r9, rcx\n
The optimised version of code above contains a bug.
There is a bug because both conventions have overlapping registers, notably rcx
and rdx
. When you try to do mov r8, rdx
, this pushes invalid data, as rdx
was already overwritten.
In this specific case, you can reverse the order of operations, and get a correct result:
# Reversed\nmov r9, rcx\nmov r8, rdx\nmov rdx, rsi\nmov rcx, rdi\n
However might not always be the case.
When generating wrappers, we must perform a validation check to determine if any source register in mov target, source
hasn't already been overwritten by a prior operation.
In the Advanced Case we saw that it's not always possible to perform mov optimisation.
This problem can be solved with a Directed Acyclic Graph
.
This problem can be solved in O(n)
complexity with a Directed Acyclic Graph
, where each node represents a register and an edge (arrow) from Node A to Node B represents a move from register A to register B.
The above (buggy) code would be represented as:
flowchart TD\n RDI --> RCX\n RSI --> RDX\n RDX --> R8\n RCX --> R9
RDI writes to RCX which writes to R9, which is now invalid. We can determine the correct mov
order, by processing them in reverse order of their dependencies
mov r9, rcx
before mov rcx, rdi
mov r8, rdx
before mov rdx, rsi
Exact order encoded depends on algorithm implementation in code; as long as the 2 derived rules are followed.
"},{"location":"dev/design/wrappers/#handling-cycles-2-node-cycle","title":"Handling Cycles (2 Node Cycle)","text":"Suppose we have 2 calling conventions with reverse parameter order. For this example we will define convention \ud83d\udc31call
. \ud83d\udc31call
uses the reverse register order of Microsoft compiler.
\ud83d\udc31call
) int AddWithShift(int a, int b) {\nreturn (a * 16) + b;\n}\n
shl ecx, 4\nlea eax, dword [rdx+rcx]\nret\n
shl edx, 4\nlea eax, dword [rcx+rdx]\nret\n
The ASM to do the calling convention transformation becomes:
UnoptimisedOptimised (Contains Bug)# Push register parameters of the function being returned (right to left, reverse loop)\npush rcx\npush rdx\n\n# Pop parameters into registers of function being called\npop rcx\npop rdx\n
mov rcx, rdx\nmov rdx, rcx\n
There is now a cycle.
flowchart TD\n RCX --> RDX\n RDX --> RCX
In this trivial example, you can use xchg
or 3 mov
(s) to swap between the two registers.
xchg rcx, rdx\n
mov {temp}, rdx\nmov rdx, rcx\nxor rcx, {temp}\n
On some Intel architectures, the mov
approach can reportedly be faster, however, it's not possible to procure a scratch register in all cases.
I'll welcome any PRs that detect and write the more optimal choice on a given architecture, however this is not planned for main library.
Adding instructions also means the wrapper might overflow to the next multiple of 16 bytes, causing more instructions to be fetched when it otherwise won't happend with xchg, potentially losing any benefits gained on those architectures.
The mappings done in Reloaded.Hooks
are a 1:1 bijective mapping. Therefore any cycle of just 2 registers can be resolved by simply swapping the involved registers.
Now imagine doing a mapping which involves 3 registers, r8
- r10
, and all registers need to be mov'd
.
flowchart TD\n R8 --> R9\n R9 --> R10\n R10 --> R8
mov R9, R8\nmov R10, R9\nmov R8, R10\n
To resolve this, we backup the register at the end of the cycle (in this case R10), disconnect it from the first register in the cycle and resolve as normal.
i.e. we solve for
flowchart TD\n R8 --> R9\n R9 --> R10
Then write original value of R10 into R8 after this code is converted into mov
sequences.
This can be done using the following strategies:
mov
into scratch register. AsmHook
) prefer callee saved register which is not a parameter. push
+ pop
register. # Move value from end of cycle into caller saved register (scratch)\nmov RAX, R10\n\n# Original (after reorder)\nmov R10, R9\nmov R9, R8\n\n# Move from caller saved register into first in cycle.\nmov R8, RAX\n
# Push value from end of cycle into stack\npush R10\n\n# Original (after reorder)\nmov R10, R9\nmov R9, R8\n\n# Pop into intended place from stack\npop R8\n
When possible to get scratch register, use mov
, otherwise use push
.
This is a theoretical idea, not implemented in library.
Only applies to platforms like x86 return addresses on stack.
In some cases, like converting between stdcall
and cdecl
; it might be possible to reuse the same parameters from the stack. Take into account the previous example:
# Re push parameters\npush dword [esp + {x}]\npush dword [esp + {x}]\n\ncall {function}\nadd esp, 8\n\nret 8\n
Strictly speaking, to convert from stdcall
to cdecl
, you will only need to convert from caller stack cleanup to callee stack cleanup i.e. ret 8
.
In this case, re-pushing parameters is redundant, as the pushed parameters from the previous method call are on stack and can still be re-used.
What we can instead do, is overwrite the return address and jump to our code.
# Pop previous return address from stack\nmov [esp], {addressPostJump} # replace return address\njmp {function} # jump to our function\nadd esp, 8 # our function returns here due to changed return address\nret 8\n
"},{"location":"dev/design/wrappers/#technical-limitations","title":"Technical Limitations","text":"Wrapper generation does not have understanding of any specific ABI, and as such cannot always be 100% correct in edge cases.
"},{"location":"dev/design/wrappers/#wrappers-dont-understand-abi-specific-rules","title":"Wrappers Don't understand ABI Specific Rules","text":"Some ABIs have unconventional rules for handling edge cases.
For example, consider the following rule used by the RISC-V ABI.
When primitive arguments twice the size of a pointer-word are passed on the stack, they are naturally aligned. When they are passed in the integer registers, they reside in an aligned even-odd register pair, with the even register holding the least-significant bits. In RV32, for example, the function void foo(int, long long) is passed its first argument in a0 and its second in a2 and a3. Nothing is passed in a1.
The wrappers cannot know or understand the intricate rules such as this that are imposed by an ABI.
"},{"location":"dev/design/wrappers/#allocating-mixed-size-registers-is-tricky","title":"Allocating Mixed Size Registers is Tricky.","text":"Optimized code does not suffer from this bug.
"},{"location":"dev/design/wrappers/#the-problem","title":"The Problem","text":"Consider a function which spills a float register xmm0
, and an nint
(native size integer). A Push
is basically a sequence of sub
and then mov
.
So (pretend ASM below is valid)
push xmm0\npush rax\n
Would become
sub rsp, 16\nmov [rsp], xmm0\nsub rsp, 8\nmov [rsp], rax\n
This is invalid, because the contents of rax will now replace half of the xmm0
register on the stack. How ABIs and compilers deal with this isn't always well standardised; some only consider lower bits volatile, (Microsoft x64) while others don't preserve the bigger registers at all (SystemV x64).
Our strategy will be to try rearrange the stack operations to avoid this problem, starting by pushing smaller registers first, and then larger registers, effectively creating:
sub rsp, 8\nmov [rsp], rax\nsub rsp, 16\nmov [rsp], xmm0\n
"},{"location":"dev/design/wrappers/#when-using-optimized-code","title":"When using Optimized Code","text":"Currently with optimizations enabled, this code compiles as:
sub rsp, 24\nmov [rsp], xmm0\nmov [rsp + 16], rax\n
Which is valid.
"},{"location":"dev/design/wrappers/#wrappers-currently-dont-understand-how-to-split-larger-registers","title":"Wrappers (Currently) Don't understand how to split larger registers.","text":"Some calling conventions, have rules where larger values (e.g. 128-bit values on x64) are split into 2 registers.
The wrapper generator cannot generate code for these functions currently.
"},{"location":"dev/design/assembly-hooks/overview/","title":"Assembly Hooks","text":"Replacing arbitrary assembly sequences (a.k.a. 'mid function hooks').
This hook is used to make small changes to existing logic, for example injecting custom logic for existing conditional branches (if
statements).
Limited effectiveness if Code Relocation is not available.
I'm not a security person/researcher. I just make full stack game modding tools, mods and libraries. Naming in these design docs might be unconventional.
This hook works by injecting a jmp
instruction inside the middle of an arbitrary assembly sequence to custom code. The person using this hook must be very careful not to break the program (corrupt stack, used registers, etc.).
Original Code
: Middle of an arbitrary sequence of assembly instructions where a branch
to custom code is placed. Hook Function
: Contains user code, including original code (depending on user preference). Original Stub
: Original code (used when hook disabled). flowchart TD\n O[Original Code]\n HK[Hook Function]\n\n O -- jump --> HK\n HK -- jump back --> O
When the hook is activated, a branch
is placed in the middle of the original assembly instruction sequence to your hook code.
Your code (and/or original code) is then executed, then it branches back to original code.
"},{"location":"dev/design/assembly-hooks/overview/#when-deactivated","title":"When Deactivated","text":"flowchart TD\n O[Original Function]\n HK[\"Hook Function <Overwritten with Original Code>\"]\n\n O -- jump --> HK\n HK -- jump back --> O
When the hook is deactivated, the 'Hook Function' is overwritten in-place with original instructions and a jump back to your code.
"},{"location":"dev/design/assembly-hooks/overview/#usage-notes","title":"Usage Notes","text":"Assembly Hooks should allow both Position Independent Code and Position Relative Code
With that in mind, the following APIs should be possible:
/// Creates an Assembly Hook given existing position independent assembly code,\n/// and address which to hook.\n/// # Arguments\n/// * `hook_address` - The address of the function or mid-function to hook.\n/// * `asm_code` - The assembly code to execute, precompiled.\nfn from_pos_independent_code_and_function_address(hook_address: usize, asm_code: &[u8]);\n\n/// Creates an Assembly Hook given existing position assembly code,\n/// and address which to hook.\n/// \n/// # Arguments\n/// * `hook_address` - The address of the function or mid-function to hook.\n/// * `asm_code` - The assembly code to execute, precompiled.\n/// * `code_address` - The original address of asm_code. \n/// \n/// # Remarks\n/// Code in `asm_code` will be relocated to new target address. \nfn from_code_and_function_address(hook_address: usize, asm_code: &[u8], code_address: usize);\n\n/// Creates an Assembly Hook given existing position assembly code,\n/// and address which to hook.\n/// \n/// # Arguments\n/// * `hook_address` - The address of the function or mid-function to hook.\n/// * `asm_isns` - The assembly instructions to place at this address.\n/// \n/// # Remarks\n/// Code in `asm_code` will be relocated to new target address. \nfn from_instructions_and_function_address(hook_address: usize, asm_isns: &[Instructions]);\n
Using overloads for clarity, in library all options should live in a struct.
Code using from_code_and_function_address
is to be preferred for usage, as users will be able to use relative branches for improved efficiency. (If they are out of range, hooking library will rewrite them)
For pure assembly code, users are expected to compile code externally using something like FASM
, put the code in their program/mod (as byte array) and pass that directly as asm_code
.
For people who want to call their own program/mod(s) from assembly, there will be a wrapper API around Jit<TRegister>
and its various Operations. This API will be cross-architecture and should contain all the necessary operations required for setting up stack/registers and calling user code.
Programmers are also expected to provide 'max allowed hook length' with each call.
"},{"location":"dev/design/assembly-hooks/overview/#hook-lengths","title":"Hook Lengths","text":"The expected hook lengths for each architecture
When using the library, the library will use the most optimal possible jmp
instruction to get to the user hook.
When calling one of the functions to create an assembly hook, the end user should specify their max permissible assembly hook length.
If a hook cannot be satisfied within that constraint, then library will throw an error.
The following table below shows common hook lengths, for:
Relative Jump
(best case) Relative Jump
range. [1]: x86 can reach any address from any address with relative branch due to integer overflow/wraparound. [2]: jmp [<Address>]
, with <Address> at < 2GiB. [3]: mov <reg>, address
+ call <reg>
. +1 if using an extended reg. [4]: macOS restricts access to < 2GiB
memory locations, so absolute jump must be used. +1 if using an extended reg. [5]: MOVZ + MOVK + LDR + BR. [6]: ADRP + ADD + BR.
Common: Thread Safety & Memory Layout
"},{"location":"dev/design/assembly-hooks/overview/#legacy-compatibility-considerations","title":"Legacy Compatibility Considerations","text":"As reloaded-hooks-rs
intends to replace Reloaded.Hooks
is must provide certain functionality for backwards compatibility.
Once reloaded-hooks-rs
releases, the legacy Reloaded.Hooks
will be a wrapper around it.
This means a few functionalities must be supported here:
Setting arbitrary 'Hook Length'.
Reloaded.Hooks
users create an ASM Hook (with default PreferRelativeJump == false
and HookLength == -1
) the wrapper for legacy API must set 'Hook Length' == 7
to emulate absolute jump size.MaxOpcodeSize
from original API should be sufficient.Supporting Assembly via FASM.
Reloaded.Hooks
wrapper will continue to ship FASM for backwards compatibility, however mods are expected to migrate to the new library in the future.Assembly hook info is packed by default to save on memory space. By default, the following limits apply:
Property 4 Byte Instruction (e.g. ARM64) Other (e.g. x86) Max Orig Code Length 128KiB 32KiB Max Hook Code Length 128KiB 32KiBThese limits may increase in the future if additional functionality warrants extending metadata length.
"},{"location":"dev/design/branch-hooks/overview/","title":"Branch Hooks","text":"Replaces a branch
(call/jump) to an existing method with a new one.
This hook is commonly used when you want to change behaviour of a function, but only for certain callers.
For example, if you have a method Draw2DElement
that's used to draw an object to the screen, but you only want to move a certain element that's rendered by Draw2DElement
, you would use a Branch Hook to replace call Draw2DElement
to call YourOwn2DElement
.
Only guaranteed to work on platforms with Targeted Memory Allocation
Because the library needs to be able to acquire memory in proximity of the original function.
Usually this is almost always achievable, but cases where Denuvo DRM inflates ARM64 binaries (20MB -> 500MB) may prove problematic as ARM64 has +-128MiB range for relative jumps.
I'm not a security person/researcher. I just make full stack game modding tools, mods and libraries. Naming in these design docs might be unconventional.
This hook works by replacing the target of a call
(a.k.a. Branch with Link) instruction with a new target.
A Branch Hook is really a specialised variant of function hook.
Notably it differs in the following ways:
There is no Wrapper To Call Original Function as no instructions are stolen.
You call
the ReverseWrapper instead of jump
ing to it.
Caller Function
: Function which originally called Original Method
. ReverseWrapper
: Translates from original function calling convention to yours. Then calls your function. <Your Function>
: Your Rust/C#/C++/Asm code.Original Method
: Original method to be called. flowchart TD\n CF[Caller Function]\n RW[Stub]\n HK[\"<Your Function>\"]\n OM[Original Method]\n\n CF -- \"call wrapper\" --> RW\n RW -- jump to your code --> HK\n HK -. \"Calls <Optionally>\" .-> OM\n OM -. \"Returns\" .-> HK
"},{"location":"dev/design/branch-hooks/overview/#when-activated-in-fast-mode","title":"When Activated in 'Fast Mode'","text":"'Fast Mode' is an optimisation that inserts the jmp to point directly into your code when possible.
flowchart TD\n CF[Caller Function]\n HK[\"<Your Function>\"]\n OM[Original Method]\n\n CF -- \"call 'Your Function' instead of original\" --> HK\n HK -. \"Calls <Optionally>\" .-> OM\n OM -. \"Returns\" .-> HK
This option allows for a small performance improvement, saving 1 instruction and some instruction prefetching load.
This is on by default (can be disabled), and will take into effect when no conversion between calling conventions is needed.
"},{"location":"dev/design/branch-hooks/overview/#when-activated-with-calling-convention-conversion","title":"When Activated (with Calling Convention Conversion)","text":"flowchart TD\n CF[Caller Function]\n RW[ReverseWrapper]\n HK[\"<Your Function>\"]\n W[Wrapper]\n OM[Original Method]\n\n CF -- \"call wrapper\" --> RW\n RW -- jump to your code --> HK\n HK -. \"Calls <Optionally>\" .-> W\n W -- \"call original (wrapped)\" --> OM\n OM -. \"Returns\" .-> W\n W -. \"Returns\" .-> HK
"},{"location":"dev/design/branch-hooks/overview/#when-deactivated","title":"When Deactivated","text":"flowchart TD\n CF[Caller Function]\n SB[Stub]\n HK[Hook Function]\n OM[Original Method]\n\n CF -- jump to stub --> SB\n SB -- jump to original --> OM
When the hook is deactivated, the stub is replaced with a direct jump back to the original function.
By bypassing your code entirely, it is safe for your dynamic library (.dll
/.so
/.dylib
) to unload from the process.
Common: Thread Safety & Memory Layout
"},{"location":"dev/design/branch-hooks/overview/#stub-memory-layout","title":"Stub Memory Layout","text":"The 'branch hook' stub uses the following memory layout:
- [Branch to Hook Function / Branch to Original Function]\n- Branch to Hook Function\n- Branch to Original Function\n
If calling convention conversion is needed, the layout looks like this:
- [ReverseWrapper / Branch to Original Function]\n- ReverseWrapper\n- Branch to Original Function\n- Wrapper\n
The library is optimised to not use redundant memory
For example, in x86 (32-bit), a jmp
instruction can reach any address from any address. In that situation, we don't write Branch to Original Function
to the buffer at all, provided a ReverseWrapper
is not needed, as it is not necessary.
Using x86 Assembly.
"},{"location":"dev/design/branch-hooks/overview/#before","title":"Before","text":"originalCaller:\n; Some code...\ncall originalFunction\n; More code...\n
"},{"location":"dev/design/branch-hooks/overview/#after-fast-mode","title":"After (Fast Mode)","text":"originalCaller:\n; Some code...\ncall userFunction ; To user method\n; More code...\n\nuserFunction:\n; New function implementation...\ncall originalFunction ; Optional.\n
"},{"location":"dev/design/branch-hooks/overview/#after","title":"After","text":"; x86 Assembly\noriginalCaller:\n; Some code...\ncall stub\n; More code...\n\nstub:\n; == BranchToHook ==\njmp newFunction\n; == BranchToHook ==\n\n; == BranchToOriginal ==\njmp originalFunction\n; == BranchToOriginal ==\n\nnewFunction:\n; New function implementation...\ncall originalFunction ; Optional.\n
"},{"location":"dev/design/branch-hooks/overview/#after-with-calling-convention-conversion","title":"After (with Calling Convention Conversion)","text":"; x86 Assembly\noriginalCaller:\n; Some code...\ncall stub\n; More code...\n\nstub:\n; == ReverseWrapper ==\n; implementation..\ncall userFunction\n; ..implementation\n; == ReverseWrapper ==\n\n; == Wrapper ==\n; implementation ..\njmp originalFunction\n; .. implementation\n; == Wrapper ==\n\n; == BranchToOriginal ==\njmp originalFunction ; Whenever disabled :wink:\n; == BranchToOriginal ==\n\nuserFunction:\n; New function implementation...\ncall wrapper; (See Above)\n
"},{"location":"dev/design/branch-hooks/overview/#after-disabled","title":"After (Disabled)","text":"; x86 Assembly\noriginalCaller:\n; Some code...\ncall stub\n; More code...\n\nstub:\n<jmp to `jmp originalFunction`> ; We disable the hook by branching to instruction that branches to original\njmp originalFunction ; Whenever disabled :wink:\n\nnewFunction:\n; New function implementation...\ncall originalFunction ; Optional.\n\noriginalFunction:\n; Original function implementation...\n
"},{"location":"dev/design/function-hooks/hooking-strategy-arm64/","title":"Interoperability (ARM64)","text":"Please read the general section first, this contains ARM64 specific stuff.
"},{"location":"dev/design/function-hooks/hooking-strategy-arm64/#fallback-strategy-free-space-from-function-alignment","title":"Fallback Strategy: Free Space from Function Alignment","text":"See General Section Notes.
In the case of ARM64, padding is usually down with the following sequences: - nop
(0xD503201F
, big endian), used by GCC. - and x0, x0
(0x00000000
), used by MSVC.
Getting sufficient bytes to make good use of them in ARM64 is more uncommon than x86.
"},{"location":"dev/design/function-hooks/hooking-strategy-x86/","title":"Interoperability (x86)","text":"Please read the general section first, this contains x86 specific stuff.
"},{"location":"dev/design/function-hooks/hooking-strategy-x86/#fallback-strategy-free-space-from-function-alignment","title":"Fallback Strategy: Free Space from Function Alignment","text":"See General Section Notes.
0x90
(GCC) or 0xCC
(MSVC) are commonly used for padding.See General Section Notes.
We use x86 in the example for general section above.
"},{"location":"dev/design/function-hooks/hooking-strategy/","title":"Interoperability (General)","text":"This page just contains common information regarding interoperability that are common to all platforms.
Interpoerability in this sense means 'stacking hooks ontop of other libraries', and how other libraries can stack hooks ontop of reloaded-hooks-rs
.
This is the general hooking strategy employed by reloaded-hooks
; derived from the facts in the rest of this document.
To ensure maximum compatibility with existing hooking systems, reloaded-hooks
uses relative jumps as these are the most popular, and thus best supported by other libraries when it comes to hook stacking.
These are the lowest overhead jumps, so are preferable in any case.
"},{"location":"dev/design/function-hooks/hooking-strategy/#if-relative-jump-is-not-possible","title":"If Relative Jump is Not Possible","text":"In the very, very, unlikely event that using (target is further than max relative jump distance
), the following strategy below is used.
If no existing hook exists, an absolute jump will be used (if possible). - Prefer indirect absolute jump (if possible).
We check for presence of 'existing hook' by catching some common instruction patterns.
"},{"location":"dev/design/function-hooks/hooking-strategy/#existing-hook","title":"Existing Hook","text":"If we have any allocated buffer in range, insert relative jump, and inside wrapper/stub use absolute jump if needed.
Otherwise (if possible), use available free space from function alignment.
Otherwise use absolute jump.
In order to optimize the code relocation process, reloaded-hooks
, will try to find a buffer that's within relative jump range to the original jump target.
If this is not possible, reloaded-hooks
will start rewriting relative jump(s) from the original function to absolute jump(s) in the presence of recognised patterns; if the code rewriter supports this.
Strategies used for improving interoperability with other hooks.
"},{"location":"dev/design/function-hooks/hooking-strategy/#free-space-from-function-alignment","title":"Free Space from Function Alignment","text":"This is a strategy for encoding absolute jumps using fewer instructions.
Processors typically fetch instructions 16 byte boundaries.
To optimise for this, compilers pad the space between end of last function and start of next.
We can exploit this \ud83d\ude09
If there's sufficient padding before the function, we can: - Insert our absolute jump there, and branch to it. or - Insert jump target there, and branch using that jump target.
"},{"location":"dev/design/function-hooks/overview/","title":"Function Hooks","text":"How hooking around entire functions works.
This hook is used to run custom callback for a function, modify its parameters or replace a function entirely. It is the most common hook.
I'm not a security person/researcher. I just make full stack game modding tools, mods and libraries. Naming in these design docs might be unconventional.
This hook works by injecting a jmp
instruction at the beginning of a function to a custom replacement function, or a stub which will later call that function.
When the original function is called, it is done via a wrapper, which restores the originally overwritten instructions that were sacrificed for the jmp
.
Stolen Bytes
: Bytes used by instructions sacrificed in original function to place a 'jmp' to the ReverseWrapper
. ReverseWrapper
: Translates from original function calling convention to yours. Then calls your function. <Your Function>
: Your Rust/C#/C++/Asm code. Wrapper
: Translates from your calling convention to original, then runs the original function. flowchart TD\n orig[Original Function] -- jump to wrapper --> rev[Reverse Wrapper]\n rev -- jump to your code --> target[\"<Your Function>\"]\n target -- \"call original via wrapper\" --> stub[\"Wrapper <with stolen bytes + jmp to original>\"]\n stub -- \"call original\" --> original[\"Original Function\"]\n\n original -- \"return value\" --> stub\n stub -- \"return value\" --> target
When the hook is activated, a stub calls into your function; which becomes the 'new original function'; that is, control will return (ret
) to the original function's caller from this function.
When your function calls the original function, it will be an entirely separate method call.
Your function can technically not call the original and replace it outright.
"},{"location":"dev/design/function-hooks/overview/#when-activated-in-fast-mode","title":"When Activated in 'Fast Mode'","text":"'Fast Mode' is an optimisation that inserts the jmp
to point directly into your code when possible.
flowchart TD\n orig[Original Function] -- to your code --> target[\"<Your Function>\"]\n target -- \"call original via wrapper\" --> stub[\"Wrapper <with stolen bytes + jmp to original>\"]\n stub -- \"call original\" --> original[\"Original Function\"]\n\n original -- \"return value\" --> stub\n stub -- \"return value\" --> target
This option allows for a small performance improvement, saving 1 instruction and some instruction prefetching load.
This is on by default (can be disabled), and will take into effect when no conversion between calling conventions is needed.
When conversion is needed, the logic will default back to When Activated.
When 'Fast Mode' is enabled, you lose the ability to unhook (for compatibility reasons).
"},{"location":"dev/design/function-hooks/overview/#when-deactivated","title":"When Deactivated","text":"Does not apply to 'Fast Mode'. When in fast mode, deactivation returns error.
flowchart TD\n orig[Original Function] -- jump to wrapper --> stub[\"Stub <stolen bytes + jmp>\"]\n stub -- \"jmp original\" --> original[\"Original Function\"]
When you deactivate a hook, the contents of 'Reverse Wrapper' are overwritten with the stolen bytes.
When 'Reverse Wrapper' is allocated, extra space is reserved for original code.
By bypassing your code entirely, it is safe for your dynamic library (.dll
/.so
/.dylib
) to unload from the process.
It is recommended library users manually specify conventions in their hook functions.\"
When the calling convention of <your function>
is not specified, wrapper libraries must insert the appropriate default convention in their wrappers.
On Linux, syscalls use R10 instead of RCX in SystemV ABI
"},{"location":"dev/design/function-hooks/overview/#rust","title":"Rust","text":"i686-pc-windows-gnu
: cdecli686-pc-windows-msvc
: cdecli686-unknown-linux-gnu
: SystemV (x86)
x86_64-pc-windows-gnu
: Microsoft x64
x86_64-pc-windows-msvc
: Microsoft x64x86_64-unknown-linux-gnu
: SystemV (x64)x86_64-apple-darwin
: SystemV (x64)Windows x86
: cdeclWindows x64
: Microsoft x64
Linux x64
: SystemV (x64)
Linux x86
: SystemV (x86)
macOS x64
: SystemV (x64)
Wrappers are stubs which convert from the calling convention of the original function to your calling convention.
If the calling convention of the hooked function and your function matches, this wrapper is simply just 1 jmp
instruction.
Wrappers are documented in their own page here.
"},{"location":"dev/design/function-hooks/overview/#reversewrappers","title":"ReverseWrapper(s)","text":"Stub which converts from your code's calling convention to original function's calling convention
This is basically Wrapper with source
and destination
swapped around
Replaces a pointer inside an array of function pointers with a new pointer.
This hook is commonly used to hook COM
objects, e.g. Direct3D
.
I'm not a security person/researcher. I just make full stack game modding tools, mods and libraries. Naming in these design docs might be unconventional.
Probably the simplest hook out of them all, it's simply replacing one pointer inside an array of function pointers with a new one.
"},{"location":"dev/design/vtable-hooks/overview/#about-vtables","title":"About VTables","text":"VTables, are what is used to support polymorphism in C++ and similar languages.
They are the mechanism that enables calling correct functions in presence of inheritance and virtual functions.
Basically what drives 'interfaces' in other languages.
"},{"location":"dev/design/vtable-hooks/overview/#vtables-in-msvc-gcc","title":"VTables in MSVC & GCC","text":"In both GCC and Visual C++, VTables are automatically created for classes that have virtual functions.
They are located at offset 0x0 of any class, thus if you get a pointer to a class, and dereference offset 0x0, you'll be at the address of the first item in the VTable.
C++Memory Layoutclass Item {\nvirtual void doSomething();\nint k;\n};\n
class Item\n void* vTable\n int k\n
vTable:\n void* doSomething\n
VTables exist in .rdata
, thus you need to change memory permissions when hooking them.
One notable thing about COM is that all interfaces inherit from IUnknown, so the first 4 methods will always be the 4 methods of IUnknown
.
Using Direct3D9 as an example
"},{"location":"dev/design/vtable-hooks/overview/#before","title":"Before","text":"flowchart LR\n EndScene --> EndScene_Orig \n Clear --> Clear_Orig\n SetTransform --> SetTransform_Orig\n GetTransform --> GetTransform_Orig
"},{"location":"dev/design/vtable-hooks/overview/#after","title":"After","text":"flowchart LR\n EndScene --> EndScene_Hook --> Your_Function --> EndScene_Orig\n Clear --> Clear_Orig\n SetTransform --> SetTransform_Orig\n GetTransform --> GetTransform_Orig
"},{"location":"dev/platform/overview/","title":"Platform Overview","text":"This page provides a list of platform specific functionality required for supporting Reloaded.Hooks-rs
.
Required
means library must have this to function. Recommended
means library may not work on some edge cases. Optional
means library can function without it. To add support for new platforms, supply the necessary function pointers in platform_functions.rs
.
[1] May be present depending on kernel configuration. Have not done adequate research. [2] Needed for Apple Silicon only.
"},{"location":"dev/platform/overview/#how-to-implement-support","title":"How to Implement Support","text":"Once you're done, submit a PR to add support for your platform.
"},{"location":"dev/platform/overview/#platform-functions","title":"Platform Functions","text":"The library provides a platform_functions.rs
file which contains all the platform specific functions.
Implement the functions in this file for your platform. Generally you'll only need unprotect_memory
, though on some platforms, you may need to implement disable_write_xor_execute
and restore_write_xor_execute
as well, depending on the platform's security policy.
For optimal performance, you should add support for your platform to reloaded-memory-buffers.
It's recommended to use reloaded-hooks-rs
alongside reloaded-memory-buffers
. The concept of the buffers library is to perform allocations as close to original code as possible, allowing for more efficient code.
This requires walking memory pages. If your OS does not have a way to do this, you can in the meantime use the built-in DefaultBufferFactory
. On some platforms you'll also need to adjust DefaultBufferFactory::create_page_as_rx
, if your platform does not allow RWX allocations.
For DefaultBufferFactory
, you might need to replace mmap_rs
in get_any_buffer
to use your platform specific page allocation function.
Platform specific functionality is not unit tested as it relies on OS/system state. Instead, integration tests are used to test the functionality.
Find the tests for a given hook type (recommend: assembly_hook
tests) and run them on your platform.
If you can't run tests on your platform, copy them to one of your programs manually.
"},{"location":"dev/platform/overview/#required-permission-change","title":"(Required) Permission Change","text":"Many platforms have per-page access permissions; which may prevent certain regions of memory from being modified.
Notably for the use cases of this library, the .text
section is usually non-writeable, which prevents hooking app functions out of the box.
To work around this, the library will call the unprotect
function in platform_functions.rs
before making code changes in memory. It will then (for performance reasons) leave the memory unprotected for the lifetime of the process (assuming it remains unprotected).
For the common operating systems; the protect
/unprotect
functions map to the following API calls:
VirtualProtect
mprotect
Only required on Apple, opt in on Linux/Windows but haven't used in a game software in the wild.
Info
Some platforms enforce a security protection called 'Write XOR Execute'; where a memory page may only be marked as writeable OR executable at any moment in time.
To work around this, the library will call the disable_write_xor_execute
function in platform_functions.rs
ahead of every function call. It will then call restore_write_xor_execute
after.
Info
The process of code relocation might require that new location of the code is within a certain region of the old code, usually 128MiB, 2GiB or 4GiB (depending on platform).
In this case, you must walk over the memory pages of a process; and find a suitable place to allocate \ud83d\ude09
"}]} \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 0000000..46e50a6 --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,148 @@ + +