Skip to content

Commit

Permalink
feat: Refactor migrations (#2732)
Browse files Browse the repository at this point in the history
* add sha256sum helper

* fmt dataset files

* add sha256 to all existing datasets

* check file signature upon download

* fix asserts and async functions

* [wip] wrap dataset downloads with cache

* archive banatic sources and configure datasets

* switch IGN to https

* checksum local download if exists

* define 7z path in flake.nix

* add tests

* rename MIRROR_URL to ETL_MIRROR_URL

* pass ETL_MIRROL_URL to e2e docker image

* configure e2e to use mirror

* pass env in workflow file

* fix IGN 2019-2020 SQL schemas

* fix datasets

* fix CEREMA datasets

* fix CEREMA and BANATIC

* fix INSEE MvtCom 2024

* flash geo migrations. broken sql migrations ;(

* refresh all migrations

* split data flashing

* split seed/migrate/source commands

* rm ETL skipped integration tests

* hide logging with verbose arg

* fix test

* fix Migrator test

* fix providers

* move stuff around

* fix test

* remove postgresjs unused dependency

* doc

* doc

* track flash migrations in db
  • Loading branch information
jonathanfallon authored Jan 21, 2025
1 parent 3b1dbd1 commit 8131bbf
Show file tree
Hide file tree
Showing 246 changed files with 9,714 additions and 7,835 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,5 +165,7 @@ jobs:
KC_BOT_CLIENT_SECRET: so_secret_000
KC_ADMIN_CLIENT: pdc-manager
KC_ADMIN_CLIENT_SECRET: so_secret_000
ETL_MIRROR_URL: https://geo-datasets-mirror.s3.fr-par.scw.cloud
ETL_ARCHIVES_URL: https://geo-datasets-archives.s3.fr-par.scw.cloud
run: just ci_test_integration
working-directory: api
283 changes: 283 additions & 0 deletions api/deno.lock

Large diffs are not rendered by default.

34 changes: 27 additions & 7 deletions api/justfile
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,32 @@ generate_certs skip=path_exists(join(cert_dir,"cert.key")):
openssl x509 -req -in {{ cert_dir }}/cert.csr -CA {{ cert_dir }}/localCA.pem -CAkey {{ cert_dir }}/localCA.key -CAcreateserial -out {{ cert_dir }}/cert.crt -days 500 -sha256
fi
# Run migrations
# Migrate the schemas and flash production data
migrate:
deno run \
--allow-net \
--allow-env \
--allow-read \
--allow-write \
--allow-sys \
--allow-ffi \
--allow-run \
src/db/cmd-migrate.ts

# Migrate the schemas and seed test data
seed:
deno run \
--allow-net \
--allow-env \
--allow-read \
--allow-write \
--allow-sys \
--allow-ffi \
--allow-run \
src/db/cmd-seed.ts

# Source the geo.perimeters data
source:
deno run \
--allow-net \
--allow-env \
Expand All @@ -88,7 +112,7 @@ migrate:
--allow-ffi \
--allow-run \
--v8-flags=--max-old-space-size=4096 \
src/db/main.ts
src/db/cmd-source.ts

# Run external data migrations
external_data_migrate:
Expand All @@ -100,11 +124,7 @@ external_data_migrate:
--allow-sys \
--allow-ffi \
--allow-import \
src/external_data/index.ts

# Seed data
seed:
just api seed
src/db/external_data/index.ts

# Create a new bucket
[private]
Expand Down
6 changes: 0 additions & 6 deletions api/src/db/.env.example

This file was deleted.

70 changes: 55 additions & 15 deletions api/src/db/README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,62 @@
# Installation
# Migrations

```
npm install -g db-migrate db-migrate-pg
```
## Requirements

### Geo
- `pg_dump`
- `pg_restore`
- `7z`
- `sha256sum`
- Access to the S3 bucket (Scaleway: geo-datasets-archives) (public read)

7zip doit être installé afin de créer la table du référentiel géographique
## Available commands

```
npm run geo:import
- `just migrate`: run all migrations and flash data from the cache
- `just seed`: run all migrations and seed test data from `providers/migration/seeds`
- `just source`: import datasets from `db/geo` to `geo.perimeters`
- `just external_data_migrate`: import external datasets

## Migrations

Migrations are ordered by name in the `src/db/migrations` folder.

- `000`: initial migrations (manual), e.g. `extensions.sql`
- `050`: geo schema
- `100`: application
- `200`: fraud
- `400`: observatory
- `500`: cee
- `600`: stats

## Dump all schemas

```shell
pg_dump --no-owner --no-acl --no-comments -s -n geo > geo.sql
pg_dump --no-owner --no-acl --no-comments -s -n cee > cee.sql
pg_dump --no-owner --no-acl --no-comments -s \
-n dashboard_stats -n geo_stats -n observatoire_stats -n observatory \
> observatory.sql

pg_dump --no-owner --no-acl --no-comments -s -n anomaly -n fraud -n fraudcheck > fraud.sql

pg_dump --no-owner --no-acl --no-comments -s \
-n application -n auth -n carpool_v2 -n certificate -n common -n company \
-n export -n honor -n operator -n policy -n territory \
> application.sql
```

# Usage
## Dump data for flashing

The `geo.perimeters` table can be sourced (see below) or flashed from a data dump.

```shell
# dump geo data for flashing
DUMP_FILE=$(date +%F)_data.sql.7z
pg_dump -Fc -xO -a -n geo | 7z a -si $DUMP_FILE
sha256sum $DUMP_FILE | tee $DUMP_FILE.sha
```

- basic
`DATABASE_URL=postgres://postgres:postgres@postgres:5432/local db-migrate up`
- with migrations dir
`DATABASE_URL=postgres://test:test@localhost:5432/test db-migrate up -m /path/to/migrations`
- verbose
`DATABASE_URL=postgres://test:test@localhost:5432/test db-migrate up -v`
1. Upload the archive alongside the sha256sum file to the cache bucket
(geo-datasets-archives) using the web interface
2. Set the visilibity of both files to public
3. Update the cache configuration in the `api/src/db/cmd-migrate.ts` file
with the public URL and the SHA256 checksum.
20 changes: 20 additions & 0 deletions api/src/db/cmd-migrate.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import { env_or_fail, env_or_false } from "@/lib/env/index.ts";
import { Migrator } from "@/pdc/providers/migration/Migrator.ts";

/**
* Migrate command.
*
* Run SQL migrations from the migrations folder.
* Flash data from remote cache.
*/
const migrator = new Migrator(env_or_fail("APP_POSTGRES_URL"), false);
await migrator.up();
await migrator.migrate({
skip: env_or_false("MIGRATIONS_SKIP_ALL"),
flash: !env_or_false("MIGRATIONS_SKIP_FLASH"),
cache: {
url: "https://geo-datasets-archives.s3.fr-par.scw.cloud/20250120_data.pgsql.7z",
sha: "9fbbd21fe84b77bac0536c270a2365c9a721ab067f8c9ccc1103e4b51a0432bf",
},
});
await migrator.down();
14 changes: 14 additions & 0 deletions api/src/db/cmd-seed.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import { env_or_fail } from "@/lib/env/index.ts";
import { Migrator } from "@/pdc/providers/migration/Migrator.ts";

/**
* Seed command.
*
* Run all SQL migrations from the `migrations` directory.
* Seed test data from `providers/migration/seeds` directory.
*/
const migrator = new Migrator(env_or_fail("APP_POSTGRES_URL"), false);
await migrator.up();
await migrator.migrate({ skip: false, flash: false });
await migrator.seed();
await migrator.down();
18 changes: 18 additions & 0 deletions api/src/db/cmd-source.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import { buildMigrator } from "@/db/geo/buildMigrator.ts";
import { env_or_fail } from "@/lib/env/index.ts";

/**
* Source geo perimeters command
*
* Download and import geo perimeters from external datasets defined
* in the ./geo directory.
*/
const connectionString = env_or_fail("APP_POSTGRES_URL");
const migrator = buildMigrator({
pool: { connectionString },
app: { targetSchema: "geo", datasets: new Set() },
});

await migrator.prepare();
await migrator.run();
await migrator.cleanup();
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { StaticAbstractDataset, StaticMigrable } from "@/etl/index.ts";
import { StaticAbstractDataset, StaticMigrable } from "../geo/index.ts";
import { AiresCovoiturage } from "./datasets/AiresCovoiturage.ts";
import { IncentiveCampaigns } from "./datasets/IncentiveCampaigns.ts";
import { CreateAiresCovoiturageTable } from "./datastructures/CreateAiresCovoiturageTable.ts";
Expand All @@ -12,15 +12,10 @@ export const datastructures: Set<StaticMigrable> = new Set([

export const datasets = async () => {
// add Aires migration
const AiresUrl =
"https://transport.data.gouv.fr/api/datasets/5d6eaffc8b4c417cdc452ac3";
const AiresUrl = "https://transport.data.gouv.fr/api/datasets/5d6eaffc8b4c417cdc452ac3";
const url = await getAiresLastUrl(AiresUrl);
const datasets: Set<StaticAbstractDataset> = new Set([]);
datasets.add(AiresCovoiturage(url));
datasets.add(
IncentiveCampaigns(
"https://www.data.gouv.fr/fr/datasets/r/08f58ee3-7b3e-43d8-9e55-3c82bf406190",
),
);
datasets.add(IncentiveCampaigns("https://www.data.gouv.fr/fr/datasets/r/08f58ee3-7b3e-43d8-9e55-3c82bf406190"));
return datasets;
};
Original file line number Diff line number Diff line change
@@ -1,9 +1,4 @@
import {
AbstractDataset,
ArchiveFileTypeEnum,
FileTypeEnum,
StaticAbstractDataset,
} from "@/etl/index.ts";
import { AbstractDataset, ArchiveFileTypeEnum, FileTypeEnum, StaticAbstractDataset } from "../../geo/index.ts";

export function AiresCovoiturage(url: string): StaticAbstractDataset {
return class extends AbstractDataset {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,4 @@
import {
AbstractDataset,
ArchiveFileTypeEnum,
FileTypeEnum,
StaticAbstractDataset,
} from "@/etl/index.ts";
import { AbstractDataset, ArchiveFileTypeEnum, FileTypeEnum, StaticAbstractDataset } from "../../geo/index.ts";

export function IncentiveCampaigns(url: string): StaticAbstractDataset {
return class extends AbstractDataset {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { AbstractDatastructure } from "@/etl/index.ts";
import { AbstractDatastructure } from "../../geo/index.ts";

export class CreateAiresCovoiturageTable extends AbstractDatastructure {
static uuid = "create_aires_covoiturage_table";
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { AbstractDatastructure } from "@/etl/index.ts";
import { AbstractDatastructure } from "../../geo/index.ts";

export class CreateIncentiveCampaignsTable extends AbstractDatastructure {
static uuid = "create_incentive_campaigns_table";
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { buildMigrator } from "@/etl/index.ts";
import { datasets, datastructures } from "@/db/external_data/datasets.ts";
import { buildMigrator } from "@/db/geo/index.ts";
import { env_or_fail } from "@/lib/env/index.ts";
import { datasets, datastructures } from "./datasets.ts";

export async function migrate(conn: string, schema: string) {
const migrator = buildMigrator({
Expand All @@ -14,12 +14,12 @@ export async function migrate(conn: string, schema: string) {
},
});
try {
console.debug("[etl] prepare migrator");
console.debug("[xdata] prepare migrator");
await migrator.prepare();
console.debug("[etl] run migrator");
console.debug("[xdata] run migrator");
await migrator.run();
await migrator.pool.end();
console.debug("[etl] done!");
console.debug("[xdata] done!");
} catch (e) {
await migrator.pool.end();
throw e;
Expand Down
Loading

0 comments on commit 8131bbf

Please sign in to comment.