Skip to content

Commit

Permalink
Dedup transactions (#277)
Browse files Browse the repository at this point in the history
* initial transaction dedup concept

* first test

* working copy

* performance optimization

* docs

* code cleanup
  • Loading branch information
dgitis authored Oct 19, 2023
1 parent f2f03a2 commit 3a7fb3c
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 1 deletion.
12 changes: 11 additions & 1 deletion models/staging/recommended_events/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,14 @@ models:
+enabled: true
```

Not all recommended events have been implemented. If you need a specific event, please consider creating a pull request with the model that you need in the [dbt-ga4 GitHub repository](https://github.com/Velir/dbt-ga4).
Not all recommended events have been implemented. If you need a specific event, please consider creating a pull request with the model that you need in the [dbt-ga4 GitHub repository](https://github.com/Velir/dbt-ga4).

## Purchase Event Transaction Deduplication

The `stg_ga4__event_purchase_deduplicated` model builds on the `sgt_ga4__event_purchase` model. It is disabled by default and thus needs to be enabled along with the `stg_ga4__event_purchase` model.

The model only processes purchase events that fall within the window as defined by `static_incremental_days` and can only reliably be expected to deduplicate purchase events occurring in the same day.

The model provides a highly-performant, minimum-viable product for this feature returning only data from the first purchase event with a matching `transaction_id` within the processing window.

You are encouraged to copy this model to your project and customize it there should this MVP be insufficient for your needs.
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{% if not flags.FULL_REFRESH %}
{% set partitions_to_query = ['current_date'] %}
{% for i in range(var('static_incremental_days', 1)) %}
{% set partitions_to_query = partitions_to_query.append('date_sub(current_date, interval ' + (i+1)|string + ' day)') %}
{% endfor %}
{% endif %}

{{
config(
enabled = false,
)
}}
with purch as (
select
*
from {{ref('stg_ga4__event_purchase')}}
{% if not flags.FULL_REFRESH %}
where event_date_dt in ({{ partitions_to_query | join(',') }})
{% endif %}
qualify row_number() over(
partition by transaction_id
order by event_timestamp
) = 1
)
select
*
from purch

0 comments on commit 3a7fb3c

Please sign in to comment.