Where do I build/store my 'data ingestion' code/modules? #27

BradBender · 2021-01-24T03:56:04Z

BradBender
Jan 24, 2021

As an SDLF team member, where do I build/store my 'data ingestion' code/modules? Data ingestion seems to fall outside the SDLF and the assumption is that data has been deposited in the s3://raw-bucket/{team}/{dataset}? I'm looking for where I should be building my YAML templates for ingestions...

I found the sdlf-util folder with a few example ingestions, but there's no CI/CD pipeline, just a deploy.sh. Is there a better strategy?

Specifically I'm looking to use DataSync to pull files daily into the 'raw'/{team}/dataset bucket, but I can't find how/where I would do that without going outside the SDLF entirely.

Thanks!
--Brad Bender

jaidisido · 2021-01-25T13:52:33Z

jaidisido
Jan 25, 2021
Maintainer

Hi Brad - thank you for this question!

As you mentioned, data ingestion falls outside the SDLF. The main reason being that there are too many ingestion patterns and we would not be able to support all of them (Streaming with Kinesis, CDC with DMS, API data...). Instead, we are trying to build a library with some examples which I believe you have already discovered in the sdlf-utils directory. We are looking to add more in the future and are looking for more contributions from the community!

The CICD component is indeed missing in that example but it should follow the same principles used for other components of the framework.

Hope this helps

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where do I build/store my 'data ingestion' code/modules? #27

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Where do I build/store my 'data ingestion' code/modules? #27

BradBender Jan 24, 2021

Replies: 1 comment

jaidisido Jan 25, 2021 Maintainer

BradBender
Jan 24, 2021

jaidisido
Jan 25, 2021
Maintainer