This project implements four different marketing attribution models using ClickHouse SQL. The models included are:
- First Touch Attribution
- Last Touch Attribution
- Linear Attribution
- U-Shaped Attribution
data-generator.ipynb
: Jupyter Notebook used to create dummy data.ma_first_touch.sql
: First touch attribution sample.ma_last_touch.sql
: Last touch attribution sample.ma_linear.sql
: Linear attribution sample.ma_u_shaped.sql
: U-shaped attribution sample.marketing_dummy_data.csv
: Sample marketing data used to test the SQL scripts.
Data generator was designed with following in mind:
- +- 90 - 100k rows,
- should be similar to event stream of user sessions with URLs (with UTMs)
- for simplicity sake every single URL is supposed to be ad click, which might convert to purchase or not
- should containt different sources, mediums, campaigns, contents, terms (for further usability, even though models use only source from UTMs)
- each event inside session should be consecutive
- session shouldn't be longer than 1 day
- 20% for random purchase within session
- only 1 conversion event within session
- For simplicity's sake attribution window is taken as a single session. More widely used 1, 7 day windows can be assigned by creating window session_id
- 4 most popular models were taken: First touch, Last touch, Linear, U-shaped
- SQL models assume that either your data is already loaded into Clickhouse
- Actual final results of revenue attribution are way too evenly split due to data being randomly generated
- SQL queries were designed to resemble actual dbt layers
- Query formatting follows dbt best practices (excluding comment sections)
- Flavor of choice - Clickhouse SQL
Attributes 100% of the revenue to the first touchpoint in the customer session.
Attributes 100% of the revenue to the last touchpoint in the customer session that led to purchase. Events after purchase are excluded from calculation.
Attributes the revenue evenly among all events in session. Events after purchase are included in calculation, since our attribution window is limited to a single session, which is rather short.
Attributes the revenue unevenly among events in session: 40% to first touchpoint, 40% to purchase event, 20% split among events in between. Events after purchase are excluded from calculation. Potential edge cases:
- if session length is single event, then it attributes 100% of revenue
- if session length is 2 events, then revenue is split 50/50