Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS ACLE #260

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions main/acle.md
Original file line number Diff line number Diff line change
Expand Up @@ -1646,6 +1646,15 @@ mechanisms such as function attributes.
Pointer Authentication extension (FEAT_PAuth_LR) are available on the target.
It is undefined otherwise.

### Guarded Control Stack

`__ARM_FEATURE_GCS_DEFAULT` is defined to `1` if the code generation is
compatible with enabling the Guarded Control Stack (GCS) extension based
protection. It is undefined otherwise.

`__ARM_FEATURE_GCS` is defined to `1` if the Guarded Control Stack (GCS)
extension is available on the target. It is undefined otherwise.

### Large System Extensions

`__ARM_FEATURE_ATOMICS` is defined if the Large System Extensions introduced in
Expand Down Expand Up @@ -2341,6 +2350,8 @@ be found in [[BA]](#BA).
| [`__ARM_FEATURE_FMA`](#fused-multiply-accumulate-fma) | Floating-point fused multiply-accumulate | 1 |
| [`__ARM_FEATURE_FP16_FML`](#fp16-fml-extension) | FP16 FML extension (Arm v8.4-A, optional Armv8.2-A, Armv8.3-A) | 1 |
| [`__ARM_FEATURE_FRINT`](#availability-of-armv8.5-a-floating-point-rounding-intrinsics) | Floating-point rounding extension (Arm v8.5-A) | 1 |
| [`__ARM_FEATURE_GCS`](#guarded-control-stack) | Guarded Control Stack | 1 |
| [`__ARM_FEATURE_GCS_DEFAULT`](#guarded-control-stack) | Guarded Control Stack protection can be enabled | 1 |
| [`__ARM_FEATURE_IDIV`](#hardware-integer-divide) | Hardware Integer Divide | 1 |
| [`__ARM_FEATURE_JCVT`](#javascript-floating-point-conversion) | Javascript conversion (ARMv8.3-A) | 1 |
| [`__ARM_FEATURE_LDREX`](#ldrexstrex) *(Deprecated)* | Load/store exclusive instructions | 0x0F |
Expand Down Expand Up @@ -3104,6 +3115,19 @@ inclusive. See implementation documentation for the effect (if any) of
this instruction and the meaning of the argument. This is available only
when compiling for AArch32.

``` c
uint64_t __chkfeat(uint64_t);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour of this intrinsic is opposite that of the instruction (chkfeat instruction clears the bit if the feature is enabled), which I think is unique among intrinsics. It would be worthwhile specifically noting this.

Also "feature is available" should be "feature is enabled" (as the chkfeat instruction will leave the bit alone if the feature is available but disabled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to write/read this code in applications:
if( __chkfeat(_CHKFEAT_GCS)) { // do GCS stuff. }
chkfeat must clear the bit if the feature is present due to it is a NOP on old CPUs where it won't change the register content.
That logic twist is not required for the app developers.

+1 for the

Also "feature is available" should be "feature is enabled"

```

Checks for hardware features at runtime using the CHKFEAT hint instruction.
`__chkfeat` returns a bitmask where a bit is set if the same bit in the
input argument is set and the corresponding feature is enabled. (Note: for
usability reasons the return value differs from how the CHKFEAT instruction
sets X16.) It can be used with predefined macros:

| **Macro name** | **Value** | **Meaning** |
| ``_CHKFEAT_GCS`` | 1 | Guarded Control Stack (GCS) protection is enabled. |

## Swap

`__swp` is available for all targets. This intrinsic expands to a
Expand Down Expand Up @@ -4650,6 +4674,54 @@ two pointers, ignoring the tags.
The return value is the sign-extended result of the computation.
The tag bits in the input pointers are ignored for this operation.

# Guarded Control Stack intrinsics

## Introduction

This section describes the intrinsics for the instructions of the
Guarded Control Stack (GCS) extension. The GCS instructions are present
in the AArch64 execution state only.

When GCS protection is enabled then function calls save the return
address to a separate stack, the GCS, that is checked against the actual
return address when the function returns. At runtime GCS protection can
be disabled and then calls and returns do not access the GCS. The GCS
grows down and a GCS pointer points to the last entry of the GCS.
Each thread has a separate GCS and GCS pointer.

To use the intrinsics, `arm_acle.h` needs to be included.

These intrinsics are available when GCS instructions are supported.
The `__chkfeat` intrinsics with `_CHKFEAT_GCS` can be used to check
if GCS protection is enabled at runtime. GCS protection is only
enabled at runtime if the code is GCS compatible and the GCS
instructions are supported.

## Intrinsics


``` c
const void *__gcspr(void);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i plan to change this back to non-const so that the intrinsics support the gcs write access enabled case (even if that's not expected to be a common setting).

```

Returns the GCS pointer of the current thread.

``` c
uint64_t __gcspopm(void);
```

Reads and returns the last entry on the GCS of the current thread and
updates the GCS pointer to point to the previous entry. If GCS
protection is disabled then it has no side effect and returns `0`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gcspopm instruction does nothing when gcs is disabled (i.e. the value in the destination register is unchanged). Is the intent that the intrinsic sets the destination register to zero before executing the instruction?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think this is the only way the intrinsic is usable in the "asynchronously disabled gcs" scenario.

(which we don't know yet if linux will support, currently it does not want to)


``` c
const void *__gcsss(const void *);
```

Switches the GCS of the current thread, where the argument is the new
GCS pointer, and returns the old GCS pointer. If GCS protection is
disabled then it has no side effect and returns `NULL`.

# State management

The specification for SME is in
Expand Down
68 changes: 68 additions & 0 deletions main/design_documents/gcs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Design Document for GCS

## Feature test

GCS support has three levels:

(1) Code generation is GCS compatible. (Compile time decision.)

(2) HW supports GCS instructions. (Might be known at compile time,
but this is a runtime feature.)

(3) GCS is enabled at runtime. (Only known at runtime.)

Where (3) implies (1) and (2). In principle a user may decide to
enable GCS even if (1) was false at compile time, but this is
a user error. The runtime system is responsible for enabling GCS
when (1) and (2) holds and GCS protection was requested for the
program.

(1) and (2) need feature test macros since they can be known at
compile time.

(3) can be detected using `__chkfeat(_CHKFEAT_GCS)` which is
available without GCS support.

## Intrinsics

Alternative designs for the support levels at which the intrinsics
are well defined:

(A) require (3),

(B) require (1) and (2) but not (3),

(C) require (2) only.

Simplest is (A), but it does not allow asynchronously disabling GCS,
for that at least (B) is needed since the intrinsics must do something
reasonable if GCS is disabled. Asynchronous disable is e.g. needed to
allow disabling GCS at dlopen time in a multi-threaded process when
the loaded module is not GCS compatible.

(C) is similar to (B) but allows using the intrinsics even if GCS is
guaranteed to be disabled. The intrinsics are expected to be used
behind runtime check for (3) since they don't do anything useful
otherwise and thus (1) and (2) are true when the intrinsics are used
either way. With (B) it is possible to only expose the intrinsics
at compile time if (1) is true which can be feature tested. With (C)
there is no obvious feature test for the presence of the intrinsics.

The future direction is to make intrinsics available unconditionally
and rely on runtime checks (e.g. via function multi-versioning). So
it makes sense to go with (C), have separate semantics defined for
the enabled and disabled case and let user code deal with the runtime
checks.

The type of the intrinsics is based on `const void *` GCS pointer
type and `uint64_t` GCS entry type. The GCS pointer could be
`const uint64_t *`, but void is more general in that it allows
different access to the GCS (e.g. accessing entries as pointers or
bytes). A GCS entry is usually a code pointer, but the architecture
requires it to be 8 bytes (even with ILP32) and it may be a special
token that requires bit operations to detect, so fixed width
unsigned int type is the most appropriate.

The const qualifier is justified for GCS even if GCS stores are
enabled because normal stores cannot modify the GCS only specific
instructions can.
Loading