Transform data quality steps into a proper sdlf stage (sdlf-stage-dataquality) #244

cnfait · 2024-01-17T23:48:40Z

Issue #, if available:
#157

Description of changes:
Run Glue Data Quality recommendations and ruleset evaluation directly from the step functions instead of inside a Glue job.

Glue Data Quality stores recommendations and rulesets - retire dedicated dynamodb table. Also piggyback on sdlf-dataset pPipelineDetails to provide a list of glue tables to run data quality stage against.

With the work done in #235 by @mureddy19, this closes #157.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

remove data-quality-controller job, run suggestions and verification jobs directly from the state machine remove crawl-data lambda, run glue crawler directly from the state machine remove check-job lambda, the state-machine handles it with glue .sync integration better lambda error handling remove dedicated data quality bucket, use central bucket/stage bucket instead remove dedicated glue crawler, use crawler defined in sdlf-dataset instead optional vpc support: specify boto3 client endpoint, vpc config for lambda functions

run glue data quality recommendations and ruleset evaluation directly from the step functions instead of inside a glue job glue data quality stores recommendations and rulesets - retire dedicated dynamodb table piggyback on sdlf-dataset pPipelineDetails to provide list of glue tables to run data quality stage against

cnfait added 2 commits January 15, 2024 17:47

cnfait self-assigned this Jan 17, 2024

cnfait merged commit dac1db7 into main Jan 17, 2024
3 checks passed

cnfait deleted the sdlf-stage-dataquality branch January 17, 2024 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transform data quality steps into a proper sdlf stage (sdlf-stage-dataquality) #244

Transform data quality steps into a proper sdlf stage (sdlf-stage-dataquality) #244

cnfait commented Jan 17, 2024

Transform data quality steps into a proper sdlf stage (sdlf-stage-dataquality) #244

Transform data quality steps into a proper sdlf stage (sdlf-stage-dataquality) #244

Conversation

cnfait commented Jan 17, 2024