Skip to content

Latest commit

 

History

History
41 lines (26 loc) · 2.46 KB

README.md

File metadata and controls

41 lines (26 loc) · 2.46 KB

Project Overview

The goal of this project is to migrate an on-premise MySQL database to a cloud data warehouse to serve business' analytical needs, by creating an ELT pipeline that implements the medallion data architecture.

Architecture

Architecture

Data Ingestion & Loading

  • The on-premise host is connected to the cloud using Microsoft Runtime Integration.

image

  • An Azure Data Factory pipeline then connects to the host, copies the data from all the tables, and loads them into a bronze container within Azure Data Lake in parquet format.

image image

Data Transformation

  • A second step in the Azure Data Factory (ADF) pipeline connects to a Databricks cluster to invoke the first phase of transformation (Bronze to silver), by executing a notebook that filters and formats the data, which is then stored in Delta format, in the silver container within Azure Data Lake.

image image

  • The last phase of transformations (Silver to Gold) is then performed in the same manner, placing the cleaned data in the gold container.

Data Modelling

Using Azure Synapse serverless SQL database as an analytics engine to directly query the data in the gold container, I created different views to reflect the different facts and dimension tables.

Synapse pipeline

schema

Analytics & Reporting

Done using PowerBI, which connects to the Azure Synapse SQL server, and loads the data.

Dashboard

image

Final thoughts

  • The sole goal of the project is to learn and get exposed to data engineering in Azure cloud, thus the use of some technologies and services is overkill for such a low amount of data.
  • Since the amount of transformations performed is light, sticking with Azure Data Factory is a more efficient and cost-effective approach.