The query engine's job is to take a QueryRequest
, which contains information about the query a user would like to run,
translate it to PostgreSQL SQL, execute it against the database, and return the results as a QueryResponse
.
One place in particular that uses the Query Engine is the /query
endpoint (defined in the ndc-hub
repository).
/query
endpoints receives a QueryRequest
, and calls the translate
function from the Query Engine
with it and with the information about the tables tracked in the metadata to receive and ExecutionPlan
.
It then calls the execute
function from the Query Engine with the same ExecutionPlan
(which then runs it against sqlserver) and gets back a QueryResponse
which it can then return to the caller.
API:
pub fn translate(
tables_info: &metadata::TablesInfo,
query_request: models::QueryRequest,
) -> Result<ExecutionPlan, Error>
pub async fn execute(
pool: sqlx::PgPool,
plan: translation::ExecutionPlan,
) -> Result<models::QueryResponse, sqlx::Error>
The translation step is essentially side-effect free - we use information from the request, as well as the information about the metadata to translate the query request into steps to run against the database.
This process is currently found in the phases/translation.rs file, and the API is the following function:
pub fn translate(
tables_info: &metadata::TablesInfo,
query_request: models::QueryRequest,
) -> Result<ExecutionPlan, Error>
The translate
function returns a ExecutionPlan
.
pub struct ExecutionPlan {
pub root_field: String,
/// Run before the query. Should be a sql::ast in the future.
pub pre: Vec<sql::string::DDL>,
/// The query.
pub query: sql::ast::Select,
/// Run after the query. Should be a sql::ast in the future.
pub post: Vec<sql::string::DDL>,
}
Right now we don't expect pre
and post
to be populated, but it could be used for things like Stored Procedures.
We maintain a SQL AST represented as Rust data types in phases/translation/sql/ast.rs.
We implement our own representation because we want more control over this core component of our application,
and we want to implement exactly what we want and not more or less. Other external libraries such as sqlparser
do not have the same goals as us, and we'll have to make compromises that will affect our codebase's complexity
and development velocity.
We have a few guidelines for the SQL AST:
The SQL AST should look like a subset of PostgreSQL SQL, and not contain any high-level abstractions, or try to abstract multiple SQL ASTs. We should implement exactly what we need, and be precise about it.
Should we need a higher-level of abstraction, and additional IR should be constructed that will sit before the SQL AST.
The SQL AST should contain structures we actively use, and not contain structures we don't use.
One such example is window functions - we don't need to include them in the AST currently because we don't have features that use them from GraphQL.
Sometimes we'd like a shorthand to build specific repeating patterns,
such as SELECT coalesce(json_agg(row_to_json(<table_alias>)), '[]') AS <column_alias> FROM <query> as <table_alias>
.
The phases/translation/sql/helpers.rs module can come in handy to help
codify certain SQL AST generation patterns. If you end up meeting a repeating long pattern that is used in multiple places,
it might be a good candidate to codify it as a helpers
function.
The SQL string is a stringify representation of the SQL AST. It can be found in phases/translation/sql/string.rs.
We separate the SQL to AST and string representation so we can write transformations and optimizations on the SQL AST.
The SQL string representation should be generated from the SQL AST by pretty printing the result. The result of converting (phases/translation/sql/convert.rs) a sql ast to string should produce a query string that can be run against sqlserver as a parameterized query, as well as the parameters that are supplied by the user.
Please use the API provided by the SQL
type. It provides functions for constructing SQL strings in an easy way, such as appending syntax (like keywords and punctuation),
identifiers, and params. Don't use append_syntax
for things that are not syntax.
The query execution receives a pool and a plan and executes it against sqlserver. It then returns the results from the query part back to the caller of the function. The code can be found in phases/execution.rs
pub async fn execute(
pool: sqlx::PgPool,
plan: translation::ExecutionPlan,
) -> Result<models::QueryResponse, sqlx::Error>
Here are a few ideas we want to maintain and why:
One of the most important tools we have to figure out how a program behaves is run it and examine the way data changes throughout the program. To do so we want to make sure the flow of the program is easy to follow, and we can examine the relevant data in any given point. Here are a few suggestions:
- Prefer functions over abstract trait methods - makes jump to definition easier.
- Annotate data types with
Debug
so that they can be traced, and prefer data shapes that can beDebug
ged over those that can't. - Prefer to encode information in a way that is inspectable at runtime.
Abstractions should come up from usage pain points rather than theoretical ideas. Often we don't know exactly what a good solution looks like in terms of code, and it is better to do a simple thing first and even duplicate code when in doubt. Abstracting or extracting duplicate code to a single place too early may lead to rigidness to a codebase before we understand the trade-offs, so if there's no real necessity (that arises from usage) it might not be worth to "fix" it.
A possible rule of thumb is that creating an abstraction to build a new thing might raise an alarm bell, and building an abstraction in the middle of building yet another thing using the same pattern might be more appropriate. Though still, You Might Not Need It.
At this point in time we've settled on a fairly sweet spot for testing, where infrastructure code is very limited and contained in one place.
Since testing can very easily become complex and "another project to maintain", we want to deliberately put limitations to how wildly it can grow,
and contain all infrastructure code to a single module - tests/common/mod.rs
(in each relevant crate), without infra code in tests.
We do this so that we will notice when it grows too large and requires some explicit and calculated attention.
We would also like, in general, to keep infrastructure code to the required minimum, and not add things we'll need to maintain before using them.