Skip to content

Commit

Permalink
Support for using profiles in dbt models (#31)
Browse files Browse the repository at this point in the history
* Add debug logging to get_profile

* Change all the macros now accept either relation or relation_name arg

* Change get_profile now returns a SQL query

* Add get_profile_table macro that returns an Agate.Table

* Add get_relation macro

* Change select_from_information_schema_columns now takes only relation as arg

* Update README

* Update README

* Update workflow

* Remove a test

* Fix a test

Co-authored-by: Simo Tumelius <[email protected]>
  • Loading branch information
stumelius and datamie-simo authored Nov 25, 2021
1 parent aaa0de3 commit 7fc757b
Show file tree
Hide file tree
Showing 13 changed files with 167 additions and 72 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ jobs:
- name: Run tests
run: |
dbt seed --target postgres
dbt run --target postgres
dbt test --target postgres
- name: Run print_profile macro
Expand Down Expand Up @@ -98,4 +99,5 @@ jobs:
- name: Run tests
run: |
dbt seed --target bigquery
dbt run --target bigquery
dbt test --target bigquery
84 changes: 56 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,23 +84,66 @@ One of the advantages of the `doc` approach over the `meta` approach is that it
❌ Presto

# Contents
* [get_profile](#get_profile-source)
* [get_profile_table](#get_profile_table-source)
* [print_profile](#print_profile-source)
* [print_profile_schema](#print_profile_schema-source)
* [print_profile_docs](#print_profile_docs-source)
* [get_profile](#get_profile-source)



# Macros

## get_profile ([source](macros/get_profile.sql))

This macro returns a relation profile as a SQL query that can be used in a dbt model. This is handy for previewing relation profiles in dbt Cloud.

### Arguments
* `relation` (required): [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) object

### Usage

Use this macro in a dbt model:

```sql
{{ dbt_profiler.get_profile(relation=ref("customers")) }}
```

To configure the macro to be called only when dbt is in [execute](https://docs.getdbt.com/reference/dbt-jinja-functions/execute) mode:

```sql
-- depends_on: {{ ref("customers") }}
{% if execute %}
{{ dbt_profiler.get_profile(relation=ref("customers")) }}
{% endif %}
```

## get_profile_table ([source](macros/get_profile_table.sql))

This macro returns a relation profile as an [agate.Table](https://agate.readthedocs.io/en/1.6.1/api/table.html#module-agate.table). The macro does not print anything to `stdout` and therefore is not meant to be used as a standalone [operation](https://docs.getdbt.com/docs/using-operations).

### Arguments
* `relation` (either `relation` or `relation_name` is required): Relation object
* `relation_name` (either `relation` or `relation_name` is required): Relation name
* `schema` (optional): Schema where `relation_name` exists (default: `none` i.e., target schema)
* `database` (optional): Database where `relation_name` exists (default: `none` i.e., target database)

### Usage

Call this macro from another macro or dbt model:

```sql
{% set table = dbt_profiler.get_profile_table(relation_name="customers") %}
```

## print_profile ([source](macros/print_profile.sql))

This macro prints a relation profile as a Markdown table to `stdout`.

### Arguments
* `relation_name` (required): Relation name
* `schema` (optional): Relation schema name (default: `none` i.e., target schema)
* `database` (optional): Relation database name (default: `none` i.e., target database)
* `relation` (either `relation` or `relation_name` is required): Relation object
* `relation_name` (either `relation` or `relation_name` is required): Relation name
* `schema` (optional): Schema where `relation_name` exists (default: `none` i.e., target schema)
* `database` (optional): Database where `relation_name` exists (default: `none` i.e., target database)
* `max_rows` (optional): The maximum number of rows to display before truncating the data (default: `none` i.e., not truncated)
* `max_columns` (optional): The maximum number of columns to display before truncating the data (default: `7`)
* `max_column_width` (optional): Truncate all columns to at most this width (default: `30`)
Expand Down Expand Up @@ -128,9 +171,10 @@ dbt run-operation print_profile --args '{"relation_name": "customers"}'
This macro prints a relation schema YAML to `stdout` containing all columns and their profiles.

### Arguments
* `relation_name` (required): Relation name
* `schema` (optional): Relation schema name (default: `none` i.e., target schema)
* `database` (optional): Relation database name (default: `none` i.e., target database)
* `relation` (either `relation` or `relation_name` is required): Relation object
* `relation_name` (either `relation` or `relation_name` is required): Relation name
* `schema` (optional): Schema where `relation_name` exists (default: `none` i.e., target schema)
* `database` (optional): Database where `relation_name` exists (default: `none` i.e., target database)
* `model_description` (optional): Model description included in the schema (default: `""`)
* `column_description` (optional): Column descriptions included in the schema (default: `""`)

Expand Down Expand Up @@ -206,10 +250,11 @@ This what the profile looks like on the dbt docs site:
This macro prints a relation profile as a Markdown table wrapped in a Jinja `docs` macro to `stdout`.

### Arguments
* `relation_name` (required): Relation name
* `relation` (either `relation` or `relation_name` is required): Relation object
* `relation_name` (either `relation` or `relation_name` is required): Relation name
* `schema` (optional): Schema where `relation_name` exists (default: `none` i.e., target schema)
* `database` (optional): Database where `relation_name` exists (default: `none` i.e., target database)
* `docs_name` (optional): `docs` macro name (default: `dbt_profiler__{{ relation_name }}`)
* `schema` (optional): Relation schema name (default: `none` i.e., target schema)
* `database` (optional): Relation database name (default: `none` i.e., target database)
* `max_rows` (optional): The maximum number of rows to display before truncating the data (default: `none` i.e., not truncated)
* `max_columns` (optional): The maximum number of columns to display before truncating the data (default: `7`)
* `max_column_width` (optional): Truncate all columns to at most this width (default: `30`)
Expand All @@ -235,20 +280,3 @@ dbt run-operation print_profile_docs --args '{"relation_name": "customers"}'
| customer_lifetime_value | bigint | 0.62 | 0.35 | 35 | False | 2021-04-28 11:36:59.431462+00 |
{% enddocs %}
```
## get_profile ([source](macros/get_profile.sql))
This macro returns a relation profile as an [agate.Table](https://agate.readthedocs.io/en/1.6.1/api/table.html#module-agate.table). The macro does not print anything to `stdout` and therefore is not meant to be used as a standalone [operation](https://docs.getdbt.com/docs/using-operations).
### Arguments
* `relation_name` (required): Relation name
* `schema` (optional): Relation schema name (default: `none` i.e., target schema)
* `database` (optional): Relation database name (default: `none` i.e., target database)
### Usage
Call this macro from another macro or dbt model:
```bash
{{ get_profile(relation_name="customers") }}
```
4 changes: 4 additions & 0 deletions integration_tests/models/profile.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
-- depends_on: {{ ref("test_data") }}
{% if execute %}
{{ dbt_profiler.get_profile(relation=ref("test_data")) }}
{% endif %}
36 changes: 36 additions & 0 deletions integration_tests/models/profile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
version: 2

models:
- name: profile
columns:
- name: column_name
tests:
- not_null
- unique

- name: data_type
tests:
- not_null

- name: not_null_proportion
tests:
- not_null

- name: distinct_proportion
tests:
- not_null

- name: distinct_count
tests:
- not_null

- name: is_unique
tests:
- not_null
- accepted_values:
quote: false
values: ["TRUE", "FALSE"]

- name: profiled_at
tests:
- not_null
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{% if execute %}
{%
set actual_profile = dbt_profiler.get_profile(
set actual_profile = dbt_profiler.get_profile_table(
relation_name="test_data"
)
%}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{% if execute %}
{%
set actual_profile = dbt_profiler.get_profile(
set actual_profile = dbt_profiler.get_profile_table(
relation_name="test_data"
).exclude(["profiled_at", "data_type"])
%}
Expand Down
16 changes: 8 additions & 8 deletions macros/cross_db_utils.sql
Original file line number Diff line number Diff line change
Expand Up @@ -30,19 +30,19 @@

{# select_from_information_schema_columns ------------------------------------------------- #}

{%- macro select_from_information_schema_columns(relation, schema, relation_name) -%}
{{ return(adapter.dispatch("select_from_information_schema_columns", macro_namespace="dbt_profiler")(relation, schema, relation_name)) }}
{%- macro select_from_information_schema_columns(relation) -%}
{{ return(adapter.dispatch("select_from_information_schema_columns", macro_namespace="dbt_profiler")(relation)) }}
{%- endmacro -%}

{%- macro default__select_from_information_schema_columns(relation, schema, relation_name) -%}
{%- macro default__select_from_information_schema_columns(relation) -%}
select
*
from {{ dbt_profiler.information_schema(relation) }}.COLUMNS
where lower(table_schema) = lower('{{ schema }}')
and lower(table_name) = lower('{{ relation_name }}')
where lower(table_schema) = lower('{{ relation.schema }}')
and lower(table_name) = lower('{{ relation.identifier }}')
{%- endmacro -%}

{%- macro redshift__select_from_information_schema_columns(relation, schema, relation_name) -%}
{%- macro redshift__select_from_information_schema_columns(relation) -%}
select
attr.attname::varchar as column_name,
type.typname::varchar as data_type,
Expand All @@ -52,7 +52,7 @@
join pg_catalog.pg_type as type on (attr.atttypid = type.oid)
join pg_catalog.pg_class as class on (attr.attrelid = class.oid)
join pg_catalog.pg_namespace as namespace on (class.relnamespace = namespace.oid)
where lower(table_schema) = lower('{{ schema }}')
and lower(table_name) = lower('{{ relation_name }}')
where lower(table_schema) = lower('{{ relation.schema }}')
and lower(table_name) = lower('{{ relation.identifier }}')
and attr.attnum > 0
{%- endmacro -%}
34 changes: 6 additions & 28 deletions macros/get_profile.sql
Original file line number Diff line number Diff line change
@@ -1,29 +1,9 @@
{% macro get_profile(relation_name, schema=none, database=none) %}

{% if schema is none %}
{% set schema = target.schema %}
{% endif %}

{% if database is none %}
{% set database = target.database %}
{% endif %}

{%-
set relation = adapter.get_relation(
database=database,
schema=schema,
identifier=relation_name
)
-%}

{% if relation is none %}
{{ exceptions.raise_compiler_error("Relation " ~ adapter.quote(relation_name) ~ " does not exist or not authorized.") }}
{% endif %}
{% macro get_profile(relation=none) %}

{{ log("Get columns in relation %s" | format(relation.include()), info=False) }}
{%- set columns = adapter.get_columns_in_relation(relation) -%}
{%- set column_names = columns | map(attribute="name") -%}


{%- set column_names = columns | map(attribute="name") | list -%}
{{ log("Columns: " ~ column_names | join(', '), info=False) }}

{% set profile_sql %}
with column_profiles as (
Expand All @@ -42,7 +22,7 @@ set relation = adapter.get_relation(
),

columns as (
{{ dbt_profiler.select_from_information_schema_columns(relation, schema, relation_name) }}
{{ dbt_profiler.select_from_information_schema_columns(relation) }}
)

select
Expand All @@ -57,8 +37,6 @@ set relation = adapter.get_relation(
left join columns on (lower(columns.column_name) = lower(column_profiles.column_name))
{% endset %}

{% set results = run_query(profile_sql) %}
{% set results = results.rename(results.column_names | map('lower')) %}
{% do return(results) %}
{% do return(profile_sql) %}

{% endmacro %}
15 changes: 15 additions & 0 deletions macros/get_profile_table.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{% macro get_profile_table(relation=none, relation_name=none, schema=none, database=none) %}

{%- set relation = dbt_profiler.get_relation(
relation=relation,
relation_name=relation_name,
schema=schema,
database=database
) -%}
{%- set profile_sql = dbt_profiler.get_profile(relation=relation) -%}
{{ log(profile_sql, info=False) }}
{% set results = run_query(profile_sql) %}
{% set results = results.rename(results.column_names | map('lower')) %}
{% do return(results) %}

{% endmacro %}
32 changes: 32 additions & 0 deletions macros/get_relation.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{% macro get_relation(relation=none, relation_name=none, schema=none, database=none) %}

{% if relation is none and relation_name is none %}
{{ exceptions.raise_compiler_error("Either relation or relation_name must be specified.") }}
{% endif %}

{% if relation is none %}
{% if schema is none %}
{% set schema = target.schema %}
{% endif %}

{% if database is none %}
{% set database = target.database %}
{% endif %}

{{ log("Get relation %s (database=%s, schema=%s)" | format(adapter.quote(relation_name), adapter.quote(database), adapter.quote(schema)), info=False) }}

{%-
set relation = adapter.get_relation(
database=database,
schema=schema,
identifier=relation_name
)
-%}
{% if relation is none %}
{{ exceptions.raise_compiler_error("Relation " ~ adapter.quote(relation_name) ~ " does not exist or not authorized.") }}
{% endif %}
{% endif %}

{% do return(relation) %}

{% endmacro %}
4 changes: 2 additions & 2 deletions macros/print_profile.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{% macro print_profile(relation_name, schema=none, database=none, max_rows=none, max_columns=7, max_column_width=30, max_precision=none) %}
{% macro print_profile(relation=none, relation_name=none, schema=none, database=none, max_rows=none, max_columns=7, max_column_width=30, max_precision=none) %}

{%- set results = dbt_profiler.get_profile(relation_name, schema=schema, database=database) -%}
{%- set results = dbt_profiler.get_profile_table(relation=relation, relation_name=relation_name, schema=schema, database=database) -%}

{% if execute %}
{% do results.print_table(max_rows=max_rows, max_columns=max_columns, max_column_width=max_column_width, max_precision=max_precision) %}
Expand Down
4 changes: 2 additions & 2 deletions macros/print_profile_docs.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{% macro print_profile_docs(relation_name, docs_name=none, schema=none, database=none, max_rows=none, max_columns=7, max_column_width=30, max_precision=none) %}
{% macro print_profile_docs(relation=none, relation_name=none, docs_name=none, schema=none, database=none, max_rows=none, max_columns=7, max_column_width=30, max_precision=none) %}

{%- set results = dbt_profiler.get_profile(relation_name, schema=schema, database=database) -%}
{%- set results = dbt_profiler.get_profile_table(relation=relation, relation_name=relation_name, schema=schema, database=database) -%}

{% if docs_name is none %}
{% set docs_name = 'dbt_profiler__' + relation_name %}
Expand Down
4 changes: 2 additions & 2 deletions macros/print_profile_schema.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{% macro print_profile_schema(relation_name, schema=none, database=none, model_description="", column_description="") %}
{% macro print_profile_schema(relation=none, relation_name=none, schema=none, database=none, model_description="", column_description="") %}

{%- set column_dicts = [] -%}
{%- set results = dbt_profiler.get_profile(relation_name, schema=schema, database=database) -%}
{%- set results = dbt_profiler.get_profile_table(relation=relation, relation_name=relation_name, schema=schema, database=database) -%}

{% if execute %}
{% for row in results.rows %}
Expand Down

0 comments on commit 7fc757b

Please sign in to comment.