-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dbt for data modeling #248
Open
SprinTech
wants to merge
8
commits into
dataforgoodfr:main
Choose a base branch
from
SprinTech:feature/dbt
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
1bcfcf3
Add dbt for data modeling
SprinTech 37d191e
Add dbt for data modeling
SprinTech 64fa5d5
Remove dbt examples
1f0f27d
Add metadata for keywords table
8546566
Add pgadmin and update dbt run command
44d23c7
Define dbt models materialization mode
a358722
Add example of staging model for keyword table
43c915d
Add example of intermediate model for keyword table
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
|
||
target/ | ||
dbt_packages/ | ||
logs/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
Welcome to your new dbt project! | ||
|
||
### Using the starter project | ||
|
||
Try running the following commands: | ||
- dbt run | ||
- dbt test | ||
|
||
|
||
### Resources: | ||
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction) | ||
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers | ||
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support | ||
- Find [dbt events](https://events.getdbt.com) near you | ||
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
|
||
# Name your project! Project names should contain only lowercase characters | ||
# and underscores. A good package name should reflect your organization's | ||
# name or the intended use of these models | ||
name: '_dbt' | ||
version: '1.0.0' | ||
|
||
# This setting configures which "profile" dbt uses for this project. | ||
profile: '_dbt' | ||
|
||
# These configurations specify where dbt should look for different types of files. | ||
# The `model-paths` config, for example, states that models in this project can be | ||
# found in the "models/" directory. You probably won't need to change these! | ||
model-paths: ["models"] | ||
analysis-paths: ["analyses"] | ||
test-paths: ["tests"] | ||
seed-paths: ["seeds"] | ||
macro-paths: ["macros"] | ||
snapshot-paths: ["snapshots"] | ||
|
||
clean-targets: # directories to be removed by `dbt clean` | ||
- "target" | ||
- "dbt_packages" | ||
|
||
|
||
# Configuring models | ||
# Full documentation: https://docs.getdbt.com/docs/configuring-models | ||
|
||
# In this example config, we tell dbt to build all models in the example/ | ||
# directory as views. These settings can be overridden in the individual model | ||
# files using the `{{ config(...) }}` macro. | ||
models: | ||
_dbt: | ||
# Config indicated by + and applies to all files under models/example/ | ||
staging: | ||
+materialized: view | ||
intermediate: | ||
+materialized: view | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cette config permet à _dbt/models/intermediate/int_keywords_aggregated_by_days_and_channel.sql de devenir une vue matérialisée j'imagine ? |
Empty file.
19 changes: 19 additions & 0 deletions
19
_dbt/models/intermediate/int_keywords_aggregated_by_days_and_channel.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
/* L'étape 'staging' est utilisé pour réaliser des opérations de transformations plus avancées (join, group by, ...) : | ||
https://docs.getdbt.com/best-practices/how-we-structure/3-intermediate */ | ||
|
||
with keywords as ( | ||
select | ||
* | ||
from {{ ref('stg_keywords') }} | ||
), | ||
|
||
keywords_grouped_by_days_and_channels as ( | ||
select | ||
channel_title, | ||
start::date as start, | ||
count(*) | ||
from keywords | ||
group by 1, 2 | ||
) | ||
|
||
select * from keywords_grouped_by_days_and_channels |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
version: 2 | ||
|
||
models: | ||
- name: stg_keywords | ||
description: List of keywords said during channel_program | ||
columns: | ||
- name: keyword_id | ||
description: "The primary key for this table" | ||
data_tests: | ||
- unique | ||
- not_null | ||
|
||
- name: updated_at | ||
description: "Last date when keywords have been updated" | ||
# data_tests: | ||
# - not_null | ||
|
||
- name: channel_title | ||
description: "Title of the channel" | ||
# data_tests: # Uncomment to trigger data test error | ||
# - not_null |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
version: 2 | ||
|
||
sources: | ||
- name: quotaclimat | ||
database: barometre | ||
schema: public | ||
tables: | ||
- name: keywords |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
-- L'étape 'staging' est utilisé pour réaliser des opérations de nettoyage basiques : https://docs.getdbt.com/best-practices/how-we-structure/2-staging | ||
|
||
with keywords as ( | ||
select * from {{ source('quotaclimat', 'keywords') }} | ||
), | ||
|
||
renamed as ( | ||
select | ||
id as keyword_id, | ||
case | ||
when channel_name = '' then null | ||
else channel_name | ||
end as channel_name, | ||
case | ||
when channel_title = '' then null | ||
else channel_title | ||
end as channel_title, | ||
case | ||
when channel_program = '' then null | ||
else channel_program | ||
end as channel_program, | ||
case | ||
when channel_program_type = '' then null | ||
else channel_program_type | ||
end as channel_program_type, | ||
start, | ||
TRIM(REPLACE(plaintext, '<unk>', '')) as plain_text, | ||
theme, | ||
case | ||
when theme::text like '%changement_climatique_constat_indirectes%' then TRUE | ||
else FALSE | ||
end as is_climatic_change_subject, | ||
case | ||
when theme::text like '%biodiversite_concepts_generaux_indirectes%' then TRUE | ||
else FALSE | ||
end as is_biodiversity_general_indirect_concept, | ||
created_at, | ||
updated_at::timestamp with time zone as updated_at, | ||
keywords_with_timestamp->1->>'keywords' as first_keyword, | ||
to_timestamp((keywords_with_timestamp->0->>'timestamp')::bigint / 1000) AS first_keyword_date, | ||
json_array_length(keywords_with_timestamp) as keywords_count | ||
from keywords | ||
) | ||
|
||
select * from renamed |
Empty file.
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,8 @@ services: | |
- ./pyproject.toml:/app/pyproject.toml | ||
- ./alembic:/app/alembic | ||
- ./alembic.ini:/app/alembic.ini | ||
- ./_dbt:/app/_dbt | ||
- ./profiles.yml:/app/profiles.yml | ||
depends_on: | ||
nginxtest: | ||
condition: service_healthy | ||
|
@@ -107,6 +109,31 @@ services: | |
postgres_db: | ||
condition: service_healthy | ||
|
||
dbt_runner: | ||
build: | ||
context: ./ | ||
dockerfile: Dockerfile | ||
entrypoint: [ "poetry", "run", "dbt", "build", "--project-dir", "/app/_dbt" ] | ||
environment: | ||
ENV: docker | ||
PYTHONPATH: /app | ||
POSTGRES_USER: user | ||
POSTGRES_DB: barometre | ||
POSTGRES_PASSWORD: password | ||
POSTGRES_HOST: postgres_db | ||
POSTGRES_PORT: 5432 | ||
tty: true | ||
volumes: | ||
- ./quotaclimat/:/app/quotaclimat/ | ||
- ./postgres/:/app/postgres/ | ||
- ./_dbt:/app/_dbt | ||
- ./profiles.yml:/app/profiles.yml | ||
networks: | ||
- db_network | ||
depends_on: | ||
postgres_db: | ||
condition: service_healthy | ||
|
||
postgres_db: | ||
image: postgres:15 | ||
ports: | ||
|
@@ -124,6 +151,21 @@ services: | |
POSTGRES_PASSWORD: password | ||
logging: # no logs for postgres container | ||
driver: none | ||
networks: | ||
- db_network | ||
|
||
pgadmin: | ||
image: dpage/pgadmin4 | ||
container_name: pgadmin | ||
environment: | ||
PGADMIN_DEFAULT_EMAIL: [email protected] | ||
PGADMIN_DEFAULT_PASSWORD: admin_password | ||
ports: | ||
- "8080:80" | ||
networks: | ||
- db_network | ||
depends_on: | ||
- postgres_db | ||
|
||
mediatree: | ||
ports: | ||
|
@@ -196,6 +238,10 @@ services: | |
postgres_db: | ||
condition: service_healthy | ||
|
||
networks: | ||
db_network: | ||
driver: bridge | ||
|
||
secrets: # https://docs.docker.com/compose/use-secrets/ | ||
pwd_api: | ||
file: secrets/pwd_api.txt | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
_dbt: | ||
target: dev | ||
outputs: | ||
dev: | ||
type: postgres | ||
host: postgres_db | ||
user: user | ||
password: password | ||
port: 5432 | ||
dbname: barometre | ||
schema: public | ||
threads: 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pour la production, j'imagine qu'on devra ajouter
dbt run
dans le bash de lancement de l'image dockerhttps://github.com/dataforgoodfr/quotaclimat/blob/main/docker-entrypoint.sh#L5