-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commonize x86 Opcode
and Operand
downwards across the three processor modes
#21
base: no-gods-no-
Are you sure you want to change the base?
Conversation
hmm, i was thinking of a bit of a different approach: as you've noted, const MNEMONICS: &'[&'static str] = &[
"add",
"sub",
"aaa", // only 16- and 32-bit `Opcode` reference this
"aas", // only 16- and 32-bit `Opcode` reference this
"movsx", // only 64-bit `Opcode` references this
"mov", // used in all modes
];
mod real_mode {
enum Opcode {
ADD = 0,
SUB = 1,
AAA = 2,
AAS = 3,
MOV = 5,
...
}
}
mod long_mode {
enum Opcode {
ADD = 0,
SUB = 1,
MOVSX = 4,
MOV = 5,
...
}
}
mod quasi_x86_name_pending {
// note that _this_ `Opcode` has the same integer values for each variant, so a conversion to this opcode can be just a transmute
enum Opcode {
ADD = 0,
SUB = 1,
AAA = 2,
AAS = 3,
MOVSX = 4,
MOV = 5,
...
}
} where this could get generated from a table like
spitballing, i really haven't thought about the table layout in particular. this could let us generate the this is trickier for mod quasi_x86_name_pending {
/// an "arch" for a pseudo-x86 - a best-effort superset of 16-, 32-, and 64-bit x86
pub struct Arch;
impl yaxpeax_arch::Arch for Arch {
// same idea as other modes, but with the superset versions of `Opcode` and `Operand`
}
struct SupersetDecoderNamePending {
x86_16: yaxpeax_x86::real_mode::InstDecoder,
x86_32: yaxpeax_x86::protected_mode::InstDecoder,
x86_64: yaxpeax_x86::long_mode::InstDecoder,
current_mode: EnumToSelectWhichDecoder
}
impl Decoder<Arch> for SupersetDecoderNamePending {
fn decode<...>(&self, words: ...) -> Result<Instruction, DecodeError> {
match self.current_mode {
x86_16 => self.x86_16.decode(words).map(|inst| inst.into_superset_form())
...
}
}
}
} this would require functions to transform an arch-specific instruction into the common-x86 form, but that fills almost the same niche as your |
If we are going to do some codegen, I'd highly recommend using a standard format like json (if possible) so others can use that data as well. |
Hmm - such a table that links I do like the idea of having all Let me think some more and see if I can't work my way towards what you've suggested. |
Just to write this idea down to save for later - we could probably implement #[repr(usize)] // Required to define the layout
enum Opcode {
ADD,
AAA,
AAS,
SUB,
MOV,
MOVSX,
}
mod long_mode {
#[superset="super::Opcode"]
#[repr(usize)] // Required to define the layout
enum Opcode {
ADD,
SUB,
MOV,
MOVSX,
INC, // ERROR: Enum variant is not specified in superset enum (as an example)
}
// Generated by proc-macro
impl Opcode {
pub fn to_superset(&self) -> super::Opcode {
// SAFETY: Guaranteed to be safe, as superset implements all variants of this subset.
unsafe { core::mem::transmute(self) }
}
pub fn from_superset(enum: super::Opcode) -> Option<Self> {
todo!()
}
}
} Such a macro would define subset variants to be equivalent to their superset variants (for 1:1 conversion or direct casting in the case of going from a subset to a superset). |
ah! i was wondering if you'd made progress on this or put it aside. is there already a proc macro for my thought was to list out the whole deal in a table (json like @i509VCB mentioned would make sense) and generate off of that, with the light benefit that we wouldn't have ~6k lines of enum variants anymore 😎 anyawy, if you're planning on putting this down, i might give that idea a try in the next few weeks. |
I made more progress - but in the interest of expediency, I've only made progress that directly impacts my project (changes here). #[superset="super::Opcode"]
#[repr(usize)]
enum Opcode {
ADD = super::Opcode::ADD,
// ...
} Done implicitly by the macro, of course. |
in case you're still watching this, i did finally give this a shot - 354df90 is the current (still not a full change set) approach. this adds a new (then there's a fair question of "why generate it with python instead of a proc macro or build.rs?", and the answer is a moral opposition to build-time codegen if it's not necessary. debugging a proc macro is really annoying and i don't like asking people to run build.rs scripts. so, generate when it's updated and commit it. very gopher brain of me. sorry to the rustaceans.) there's a bit more on top of this commit that i've yet to get to a point i want to push, but i'm convinced that this gets us to a point where i also have a sneaking suspicion that even with the extra source lines, this might reduce the total resulting size of the compiled crate with more than one architecture included. with |
This is a follow-up to #19.
These enums and structures are mostly identical across all three processor modes, and it is useful to combine these for writing code that is generic to all three modes.
In order to access these common fields, a new trait
X86Instruction
(open for naming suggestions) has been added to provide access to these fields.The trait is kind of janky to use as of now: you must declare the bound with a
where
clause: