Skip to content

Commit

Permalink
feat: RFD8 for describing policy expr internals refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
j-lanson committed Oct 21, 2024
1 parent 30bde41 commit 7596377
Showing 1 changed file with 235 additions and 0 deletions.
235 changes: 235 additions & 0 deletions site/content/rfds/0008-policy-expr-typing-refactor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
---
title: "Policy Expression Typing"
---

## The Current Situation

Expressions in the policy expression language used by Hipcheck do not get
type-checked until they are evaluated, similar to an interpreted language.
Hipcheck analysis takes multiple orders of magnitude longer to complete than it
takes to evaluate a policy expression. Because of this, it can be frustrating
for people writing policy expressions to wait until after an analysis has run to
find out they had a type error in their expression. It would be nice to be able
to vet the expressions in a policy file as much as possible before running
analysis to avoid these types of situations.

The current design of policy expressions does not have a robust type system, as
evidenced by the inability to do pre-evaluation type checking, and the inability
to run-time cast between primitive types, which lead to [Issue #449](https://github.com/mitre/hipcheck/issues/449).

In this bug, if Hipcheck analysis query happened to return a float that
ended in `.0` and was therefore a whole number, when serialized then
deserialized and inserted into a policy expression, it would be treated as an
integer, since JSON does not distinguish between floats and ints. The binary
arithmetic functions (e.g. `add`) in the policy expression standard environment
expect two like-typed primitives (float + float, int + int), and so would throw
a runtime type error when a would-be float operand was late-bound as an integer.

```
(lte $ 0.2) , "0.0" --> (lte 0 0.2) --> Error::BadType
```

This particular issue was solved by introducing an `upcast` function that could
turn a `Primitive::Int` into a `Primitive::Float`, and then changing the
behavior of `binary_primitive_op()` to detect a "one int, one float" situation,
and dynamically promote the integer operand.

However, it would be nicer and cleaner to be able to type-check and upcast
primitives across the entire expression before evaluation. For example, allowing
`[0 1 2.0 3]` by upcasting all non-float elements to floats.

## The Goal

Briefly, the goal of this RFD is to describe a refactor of the Policy Expression
system that enables "compile"-time type checking as much as possible, and makes
it in general to write Expr-manipulating functionality using the `Visitor`
pattern.

### What Would Need To Happen

1. Introduce first-class types and a uniform way to get the type of a given
`Expr/Primitive`
2. Represent the intention to cast a primitive in the Expr struct ecosystem.
3. Type information associated with functions/variables added to an `Env`
instance.
4. `Env` instance available when type-checking to grab information about a
function/variable.
5. Implement `TypeChecker` as a struct that implements a new `ExprVisitor` trait
6. Re-implement expression evaluation as a `impl ExprVisitor`

## Proposal

### Extracting `Expr` Variants into Types

First, we move out variants of `Expr` with fields into their own structs. E.g.

```rust
Expr::Array(Vec<Expr>)

// Becomes

struct Array {
elts: Vec<Expr>
}
Expr::Array(Array)
```

The benefit of this is that we can implement traits on each `Expr` variant, and
add different `Visitor` functions for each. Without the above example change to
`Array`, you can't write a `fn visit_array(array: ?)` that is different from `fn
visit_expr(expr: Expr)` because there is no internal type to unwrap and
therefore distinguish it.

We do this same change to `Function`, and `Lambda`.

### Adding Typing

We then add a `enum Type` to capture the types of all expressions/primitives as
follows.

```rust
type PrimitiveType = std::mem::Discriminant<Primitive>;
type ArrayType = Option<std::mem::Discriminant<Primitive>>;
struct FunctionType {
pub args: Vec<Box<Type>>,
pub output: Box<Type>,
}
enum Type {
Primitive(PrimitiveType),
Array(ArrayType),
Function(FunctionType),
...
}
```

For primitives, we use the `Discriminant` of the primitive enum. As arrays can
only contain primitives, they also use the `Primitive` discriminant, but since
we can't know at compile time the type of an empty array or one whose only
element is a `JSONPointer`, it is an `Option`.

A `FunctionType` is a combination of the array of types of its inputs and
outputs. This will be somewhat difficult to resolve with the existing `Env`
system, as many functions do implicit overloading (e.g. the same `add` function
can handle int, float, and span types), so the output type is dependent on the
input type. We should consider dynamically retrieving the type of a function by
passing a `&Env` reference.

With this `Type` information represented, we can now add and implement the
`Typed` trait:

```rust
trait Typed {
fn get_type(&self) -> Type;
}
impl Typed for Primitive {
fn get_type(&self) -> Type {
Type::Primitive(std::mem::discriminant(self))
}
}
impl Typed for Expr {
...
}
```

Notably because of an overloaded function's need to check its own arguments
against the function implementation, we'll either need to augment
`Expr::Function` to contain some reference to the actual underlying function in
`Env` or to be able to query `Env` about that function when `Typed::get_type()`
is being executed.

### Adding `Cast` Type

Now to be able to represent a cast operation in the `Expr` ecosystem. Following
the above rules about creating distinct struct, we add the following to the
`Expr` enum:

```rust
struct Cast {
target: PrimitiveType,
expr: Box<Expr>,
}

Expr::Cast(Cast)

impl Typed for Cast {
fn get_type(&self) -> Type {
Type::Primitive(self.target)
}
}

```

With this tool, when we perform type checking / upcasting as a stage of
expression compilation, `Function` and `Array` instances can insert `Cast`
nodes to "wrap" improperly typed primitives. This would allow `(lte 0 0.2)` to
be evaluated properly, as the function replaces its first operand with
`Cast::new(op2.get_type(), op1)`. There will need to be some `try_upcast() ->
Result<Expr>` function to determine whether a cast can be done according to the
semantics of the language.

### The `ExprVisitor` Trait

To re-organize computation/tranformation of the `Expr` tree generated by a
policy expression program, we can use the `Visitor` pattern. We define a trait
as such:

```rust
trait ExprVisitor<T> {
fn visit(&self, expr: &mut Expr) -> T;
fn visit_array(&self, arr: &mut Array) -> T;
fn visit_function(&self, f: &mut Function) -> T;
fn visit_primitive(&self, p: &mut Primitive) -> T;
fn visit_cast(&self, c: &mut Cast) -> T;
...
}
```

Note that the separation of different functions to handle different `Expr`
variants is enabled by splitting them out into distinct types wrapped by their
`Expr` variant.

We can write multiple structs that implement `ExprVisitor`.

1. `struct InsertCast` which inserts `Cast` nodes where applicable
2. `struct TypeCheck` which returns `Result<()>` to report a type error.
3. `struct JsonInjector` which takes `context` and replaces all `Expr::JsonPointer` accordingly.
4. `struct Executor`. Re-impl the existing `Executor` struct to obey this pattern.

This would allow us to organize our transformation steps more effectively. When
we first parse a policy expression from file, we can call `InsertCast,
TypeCheck` to make sure everything pre-context-injection is sound, with some
ambiguity allowed around the JSON pointers themselves. Then once analysis
completes we can `JsonInjector/InsertCast/TypeCheck` again, then finally call
`Executor`'s `visit()` function to return the evaluated `Expr`.

### Checking Function Return and Argument Types

When you call `Typed::get_type()` on a function, it will return the "function
type," which is a combination of information about the (actual) argument types
and the number of arguments it expects. However, sometimes we want to get the
"evaluated" type of the function, namely what type it returns when executed.
We also want to be able to tell whether the "evaluated" types passed to a
function match the expected types. We need the "evaluated" types of the passed
expressions because they themselves might be functions instead of primitives
or arrays.

We therefore need a "type checker" function for each function that will run
first on the types of the function arguments to determine if there is a type
error or, if not, what will be returned. This is especially important for
overloaded functions whose return-type may be dictated by the types of the
passed arguments, or determined to be unknown if not enough information is
provided.

A type checking function has the signature

```rust
fn(args: &[Type]) -> Result<ReturnableType>
```

Where `ReturnableType` is the "evaluated" type of the function. The
`ReturnableType` must be a separate enum, since only "unknown", primitives, and
arrays are valid return types in the current policy expression language.

Now that we have the concept of a "checker" function, every function registered
in the `Env` must be associated with a checker function, which will be executed
during type checking and at run-time.

0 comments on commit 7596377

Please sign in to comment.