General architecture

Query Engine

The query engine's job is to take a QueryRequest, which contains information about the query a user would like to run, translate it to PostgreSQL SQL, execute it against the database, and return the results as a QueryResponse.

One place in particular that uses the Query Engine is the /query endpoint (defined in the ndc-hub repository).

/query endpoints receives a QueryRequest, and calls the translate function from the Query Engine with it and with the information about the tables tracked in the metadata to receive and ExecutionPlan. It then calls the execute function from the Query Engine with the same ExecutionPlan (which then runs it against sqlserver) and gets back a QueryResponse which it can then return to the caller.

API:

pub fn translate(
    tables_info: &metadata::TablesInfo,
    query_request: models::QueryRequest,
) -> Result<ExecutionPlan, Error>

pub async fn execute(
    pool: sqlx::PgPool,
    plan: translation::ExecutionPlan,
) -> Result<models::QueryResponse, sqlx::Error>

Translation

The translation step is essentially side-effect free - we use information from the request, as well as the information about the metadata to translate the query request into steps to run against the database.

This process is currently found in the phases/translation.rs file, and the API is the following function:

pub fn translate(
    tables_info: &metadata::TablesInfo,
    query_request: models::QueryRequest,
) -> Result<ExecutionPlan, Error>

The translate function returns a ExecutionPlan.

pub struct ExecutionPlan {
    pub root_field: String,
    /// Run before the query. Should be a sql::ast in the future.
    pub pre: Vec<sql::string::DDL>,
    /// The query.
    pub query: sql::ast::Select,
    /// Run after the query. Should be a sql::ast in the future.
    pub post: Vec<sql::string::DDL>,
}

Right now we don't expect pre and post to be populated, but it could be used for things like Stored Procedures.

SQL AST

We maintain a SQL AST represented as Rust data types in phases/translation/sql/ast.rs. We implement our own representation because we want more control over this core component of our application, and we want to implement exactly what we want and not more or less. Other external libraries such as sqlparser do not have the same goals as us, and we'll have to make compromises that will affect our codebase's complexity and development velocity.

We have a few guidelines for the SQL AST:

The SQL AST should mimic PostgreSQL directly

The SQL AST should look like a subset of PostgreSQL SQL, and not contain any high-level abstractions, or try to abstract multiple SQL ASTs. We should implement exactly what we need, and be precise about it.

Should we need a higher-level of abstraction, and additional IR should be constructed that will sit before the SQL AST.

Implement what you need and not more

The SQL AST should contain structures we actively use, and not contain structures we don't use.

One such example is window functions - we don't need to include them in the AST currently because we don't have features that use them from GraphQL.

Use sql::helpers to build the AST

Sometimes we'd like a shorthand to build specific repeating patterns, such as SELECT coalesce(json_agg(row_to_json(<table_alias>)), '[]') AS <column_alias> FROM <query> as <table_alias>. The phases/translation/sql/helpers.rs module can come in handy to help codify certain SQL AST generation patterns. If you end up meeting a repeating long pattern that is used in multiple places, it might be a good candidate to codify it as a helpers function.

SQL string

The SQL string is a stringify representation of the SQL AST. It can be found in phases/translation/sql/string.rs.

We separate the SQL to AST and string representation so we can write transformations and optimizations on the SQL AST.

The SQL string representation should be generated from the SQL AST by pretty printing the result. The result of converting (phases/translation/sql/convert.rs) a sql ast to string should produce a query string that can be run against sqlserver as a parameterized query, as well as the parameters that are supplied by the user.

Please use the API provided by the SQL type. It provides functions for constructing SQL strings in an easy way, such as appending syntax (like keywords and punctuation), identifiers, and params. Don't use append_syntax for things that are not syntax.

Query Execution

The query execution receives a pool and a plan and executes it against sqlserver. It then returns the results from the query part back to the caller of the function. The code can be found in phases/execution.rs

pub async fn execute(
    pool: sqlx::PgPool,
    plan: translation::ExecutionPlan,
) -> Result<models::QueryResponse, sqlx::Error>

Patterns and guiding principles

Here are a few ideas we want to maintain and why:

Optimize for inspectability

One of the most important tools we have to figure out how a program behaves is run it and examine the way data changes throughout the program. To do so we want to make sure the flow of the program is easy to follow, and we can examine the relevant data in any given point. Here are a few suggestions:

Prefer functions over abstract trait methods - makes jump to definition easier.
Annotate data types with Debug so that they can be traced, and prefer data shapes that can be Debugged over those that can't.
Prefer to encode information in a way that is inspectable at runtime.

Prefer flexibility over rigidness until knowing better

Abstractions should come up from usage pain points rather than theoretical ideas. Often we don't know exactly what a good solution looks like in terms of code, and it is better to do a simple thing first and even duplicate code when in doubt. Abstracting or extracting duplicate code to a single place too early may lead to rigidness to a codebase before we understand the trade-offs, so if there's no real necessity (that arises from usage) it might not be worth to "fix" it.

A possible rule of thumb is that creating an abstraction to build a new thing might raise an alarm bell, and building an abstraction in the middle of building yet another thing using the same pattern might be more appropriate. Though still, You Might Not Need It.

Simplify testing and avoid infrastructure code

At this point in time we've settled on a fairly sweet spot for testing, where infrastructure code is very limited and contained in one place. Since testing can very easily become complex and "another project to maintain", we want to deliberately put limitations to how wildly it can grow, and contain all infrastructure code to a single module - tests/common/mod.rs (in each relevant crate), without infra code in tests.

We do this so that we will notice when it grows too large and requires some explicit and calculated attention.

We would also like, in general, to keep infrastructure code to the required minimum, and not add things we'll need to maintain before using them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

architecture.md

architecture.md

General architecture

Query Engine

Translation

SQL AST

The SQL AST should mimic PostgreSQL directly

Implement what you need and not more

Use sql::helpers to build the AST

SQL string

Query Execution

Patterns and guiding principles

Optimize for inspectability

Prefer flexibility over rigidness until knowing better

Simplify testing and avoid infrastructure code

Files

architecture.md

Latest commit

History

architecture.md

File metadata and controls

General architecture

Query Engine

Translation

SQL AST

The SQL AST should mimic PostgreSQL directly

Implement what you need and not more

Use sql::helpers to build the AST

SQL string

Query Execution

Patterns and guiding principles

Optimize for inspectability

Prefer flexibility over rigidness until knowing better

Simplify testing and avoid infrastructure code