-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCS ACLE #260
base: main
Are you sure you want to change the base?
GCS ACLE #260
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1646,6 +1646,15 @@ mechanisms such as function attributes. | |
Pointer Authentication extension (FEAT_PAuth_LR) are available on the target. | ||
It is undefined otherwise. | ||
|
||
### Guarded Control Stack | ||
|
||
`__ARM_FEATURE_GCS_DEFAULT` is defined to `1` if the code generation is | ||
compatible with enabling the Guarded Control Stack (GCS) extension based | ||
protection. It is undefined otherwise. | ||
|
||
`__ARM_FEATURE_GCS` is defined to `1` if the Guarded Control Stack (GCS) | ||
extension is available on the target. It is undefined otherwise. | ||
|
||
### Large System Extensions | ||
|
||
`__ARM_FEATURE_ATOMICS` is defined if the Large System Extensions introduced in | ||
|
@@ -2341,6 +2350,8 @@ be found in [[BA]](#BA). | |
| [`__ARM_FEATURE_FMA`](#fused-multiply-accumulate-fma) | Floating-point fused multiply-accumulate | 1 | | ||
| [`__ARM_FEATURE_FP16_FML`](#fp16-fml-extension) | FP16 FML extension (Arm v8.4-A, optional Armv8.2-A, Armv8.3-A) | 1 | | ||
| [`__ARM_FEATURE_FRINT`](#availability-of-armv8.5-a-floating-point-rounding-intrinsics) | Floating-point rounding extension (Arm v8.5-A) | 1 | | ||
| [`__ARM_FEATURE_GCS`](#guarded-control-stack) | Guarded Control Stack | 1 | | ||
| [`__ARM_FEATURE_GCS_DEFAULT`](#guarded-control-stack) | Guarded Control Stack protection can be enabled | 1 | | ||
| [`__ARM_FEATURE_IDIV`](#hardware-integer-divide) | Hardware Integer Divide | 1 | | ||
| [`__ARM_FEATURE_JCVT`](#javascript-floating-point-conversion) | Javascript conversion (ARMv8.3-A) | 1 | | ||
| [`__ARM_FEATURE_LDREX`](#ldrexstrex) *(Deprecated)* | Load/store exclusive instructions | 0x0F | | ||
|
@@ -3104,6 +3115,19 @@ inclusive. See implementation documentation for the effect (if any) of | |
this instruction and the meaning of the argument. This is available only | ||
when compiling for AArch32. | ||
|
||
``` c | ||
uint64_t __chkfeat(uint64_t); | ||
``` | ||
|
||
Checks for hardware features at runtime using the CHKFEAT hint instruction. | ||
`__chkfeat` returns a bitmask where a bit is set if the same bit in the | ||
input argument is set and the corresponding feature is enabled. (Note: for | ||
usability reasons the return value differs from how the CHKFEAT instruction | ||
sets X16.) It can be used with predefined macros: | ||
|
||
| **Macro name** | **Value** | **Meaning** | | ||
| ``_CHKFEAT_GCS`` | 1 | Guarded Control Stack (GCS) protection is enabled. | | ||
|
||
## Swap | ||
|
||
`__swp` is available for all targets. This intrinsic expands to a | ||
|
@@ -4650,6 +4674,54 @@ two pointers, ignoring the tags. | |
The return value is the sign-extended result of the computation. | ||
The tag bits in the input pointers are ignored for this operation. | ||
|
||
# Guarded Control Stack intrinsics | ||
|
||
## Introduction | ||
|
||
This section describes the intrinsics for the instructions of the | ||
Guarded Control Stack (GCS) extension. The GCS instructions are present | ||
in the AArch64 execution state only. | ||
|
||
When GCS protection is enabled then function calls save the return | ||
address to a separate stack, the GCS, that is checked against the actual | ||
return address when the function returns. At runtime GCS protection can | ||
be disabled and then calls and returns do not access the GCS. The GCS | ||
grows down and a GCS pointer points to the last entry of the GCS. | ||
Each thread has a separate GCS and GCS pointer. | ||
|
||
To use the intrinsics, `arm_acle.h` needs to be included. | ||
|
||
These intrinsics are available when GCS instructions are supported. | ||
The `__chkfeat` intrinsics with `_CHKFEAT_GCS` can be used to check | ||
if GCS protection is enabled at runtime. GCS protection is only | ||
enabled at runtime if the code is GCS compatible and the GCS | ||
instructions are supported. | ||
|
||
## Intrinsics | ||
|
||
|
||
``` c | ||
const void *__gcspr(void); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i plan to change this back to non-const so that the intrinsics support the gcs write access enabled case (even if that's not expected to be a common setting). |
||
``` | ||
|
||
Returns the GCS pointer of the current thread. | ||
|
||
``` c | ||
uint64_t __gcspopm(void); | ||
``` | ||
|
||
Reads and returns the last entry on the GCS of the current thread and | ||
updates the GCS pointer to point to the previous entry. If GCS | ||
protection is disabled then it has no side effect and returns `0`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The gcspopm instruction does nothing when gcs is disabled (i.e. the value in the destination register is unchanged). Is the intent that the intrinsic sets the destination register to zero before executing the instruction? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, i think this is the only way the intrinsic is usable in the "asynchronously disabled gcs" scenario. (which we don't know yet if linux will support, currently it does not want to) |
||
|
||
``` c | ||
const void *__gcsss(const void *); | ||
``` | ||
|
||
Switches the GCS of the current thread, where the argument is the new | ||
GCS pointer, and returns the old GCS pointer. If GCS protection is | ||
disabled then it has no side effect and returns `NULL`. | ||
|
||
# State management | ||
|
||
The specification for SME is in | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Design Document for GCS | ||
|
||
## Feature test | ||
|
||
GCS support has three levels: | ||
|
||
(1) Code generation is GCS compatible. (Compile time decision.) | ||
|
||
(2) HW supports GCS instructions. (Might be known at compile time, | ||
but this is a runtime feature.) | ||
|
||
(3) GCS is enabled at runtime. (Only known at runtime.) | ||
|
||
Where (3) implies (1) and (2). In principle a user may decide to | ||
enable GCS even if (1) was false at compile time, but this is | ||
a user error. The runtime system is responsible for enabling GCS | ||
when (1) and (2) holds and GCS protection was requested for the | ||
program. | ||
|
||
(1) and (2) need feature test macros since they can be known at | ||
compile time. | ||
|
||
(3) can be detected using `__chkfeat(_CHKFEAT_GCS)` which is | ||
available without GCS support. | ||
|
||
## Intrinsics | ||
|
||
Alternative designs for the support levels at which the intrinsics | ||
are well defined: | ||
|
||
(A) require (3), | ||
|
||
(B) require (1) and (2) but not (3), | ||
|
||
(C) require (2) only. | ||
|
||
Simplest is (A), but it does not allow asynchronously disabling GCS, | ||
for that at least (B) is needed since the intrinsics must do something | ||
reasonable if GCS is disabled. Asynchronous disable is e.g. needed to | ||
allow disabling GCS at dlopen time in a multi-threaded process when | ||
the loaded module is not GCS compatible. | ||
|
||
(C) is similar to (B) but allows using the intrinsics even if GCS is | ||
guaranteed to be disabled. The intrinsics are expected to be used | ||
behind runtime check for (3) since they don't do anything useful | ||
otherwise and thus (1) and (2) are true when the intrinsics are used | ||
either way. With (B) it is possible to only expose the intrinsics | ||
at compile time if (1) is true which can be feature tested. With (C) | ||
there is no obvious feature test for the presence of the intrinsics. | ||
|
||
The future direction is to make intrinsics available unconditionally | ||
and rely on runtime checks (e.g. via function multi-versioning). So | ||
it makes sense to go with (C), have separate semantics defined for | ||
the enabled and disabled case and let user code deal with the runtime | ||
checks. | ||
|
||
The type of the intrinsics is based on `const void *` GCS pointer | ||
type and `uint64_t` GCS entry type. The GCS pointer could be | ||
`const uint64_t *`, but void is more general in that it allows | ||
different access to the GCS (e.g. accessing entries as pointers or | ||
bytes). A GCS entry is usually a code pointer, but the architecture | ||
requires it to be 8 bytes (even with ILP32) and it may be a special | ||
token that requires bit operations to detect, so fixed width | ||
unsigned int type is the most appropriate. | ||
|
||
The const qualifier is justified for GCS even if GCS stores are | ||
enabled because normal stores cannot modify the GCS only specific | ||
instructions can. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behaviour of this intrinsic is opposite that of the instruction (chkfeat instruction clears the bit if the feature is enabled), which I think is unique among intrinsics. It would be worthwhile specifically noting this.
Also "feature is available" should be "feature is enabled" (as the chkfeat instruction will leave the bit alone if the feature is available but disabled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to write/read this code in applications:
if( __chkfeat(_CHKFEAT_GCS)) { // do GCS stuff. }
chkfeat must clear the bit if the feature is present due to it is a NOP on old CPUs where it won't change the register content.
That logic twist is not required for the app developers.
+1 for the