Skip to content

Commit

Permalink
Improve profiler performance in Snowflake by reducing table scans usi…
Browse files Browse the repository at this point in the history
…ng a single CTE from which column profiles are calculated (#48)

Co-authored-by: Simo Tumelius <[email protected]>
  • Loading branch information
stumelius and datamie-simo authored May 17, 2022
1 parent fa5efc5 commit 03036db
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 2 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
target/
dbt_modules/
dbt_packages/
logs/

venv/
Expand Down
10 changes: 8 additions & 2 deletions macros/get_profile.sql
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,13 @@
{{ log("Column data types: " ~ data_type_map, info=False) }}

{% set profile_sql %}
with column_profiles as (
with source_data as (
select
*
from {{ relation }}
),

column_profiles as (
{% for column_name in profile_column_names %}
{% set data_type = data_type_map.get(column_name.lower(), "") %}
select
Expand Down Expand Up @@ -88,7 +94,7 @@
{%- endif %}
cast(current_timestamp as {{ dbt_profiler.type_string() }}) as profiled_at,
{{ loop.index }} as _column_position
from {{ relation }}
from source_data

{% if not loop.last %}union all{% endif %}
{% endfor %}
Expand Down

0 comments on commit 03036db

Please sign in to comment.