Where do I build/store my 'data ingestion' code/modules? #27
Unanswered
BradBender
asked this question in
Q&A
Replies: 1 comment
-
Hi Brad - thank you for this question! As you mentioned, data ingestion falls outside the SDLF. The main reason being that there are too many ingestion patterns and we would not be able to support all of them (Streaming with Kinesis, CDC with DMS, API data...). Instead, we are trying to build a library with some examples which I believe you have already discovered in the sdlf-utils directory. We are looking to add more in the future and are looking for more contributions from the community! The CICD component is indeed missing in that example but it should follow the same principles used for other components of the framework. Hope this helps |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As an SDLF team member, where do I build/store my 'data ingestion' code/modules? Data ingestion seems to fall outside the SDLF and the assumption is that data has been deposited in the
s3://raw-bucket/{team}/{dataset}
? I'm looking for where I should be building my YAML templates for ingestions...I found the sdlf-util folder with a few example ingestions, but there's no CI/CD pipeline, just a deploy.sh. Is there a better strategy?
Specifically I'm looking to use DataSync to pull files daily into the 'raw'/{team}/dataset bucket, but I can't find how/where I would do that without going outside the SDLF entirely.
Thanks!
--Brad Bender
Beta Was this translation helpful? Give feedback.
All reactions