Vision: Serve as a reference for how the local modern data stack can be used in practice. Over time build more production grade features and deployment modes to serve as a go-to example for others to spread the use of the software engineering best practices in the LMDS.
items will be checked:
- to be done
- done
if they are implemented
the traditional transformations everyone is using
python dependencies with https://github.com/basnijholt/unidep
- dagster
- duckdb
- dbt-duckdb
- excel
- s3/minio
- delta lake
- cube.dev
- secrets in sops with age
- quality checks
add
- ray.io, dagaster-pipes
- simple tabular AI sample
- stateful quality checks (anomaly detection)
add
- LLM case https://github.com/imartinez/privateGPT
- Qdrant
- Postgres with pg_vector
- perhaps https://pathway.com/solutions/llm-app/
- translation case https://ai.meta.com/resources/models-and-libraries/seamless-communication-models/
- streaming translation case
- dagster
- duckdb
- dbt-duckdb
- s3/minio
- kafka / redpanda
- risingwave / starrocks
- secrets in sops with age
- some streaming dashboard solution
to be discussed if we want to include them
- support multiple deployment modes i.e. local, docker, k8s local via µ-k8s in i.e. rancher desktop, cloud
- how do lambda functions fit in? In particular ones like: https://modal.com/
- anomaly detection with dagster stateful asset checks
- full blown metrics samples based on E2E company tooling mentioned i.e. https://github.com/geoHeil/awesome-tools/blob/master/startup_company.md
- including commercial PaaS?
- https://georgheiler.com/2023/12/11/dagster-dbt-duckdb-as-new-local-mds/
- https://juhache.substack.com/p/moving-from-bi-to-data-apps-part?r=l9wvi
- https://www.nintoracaudio.dev/data-eng,music,dx7/2023/12/15/raddd-stack.html
- https://www.infoq.com/news/2023/12/expedia-websockets-kafka-query/
- https://www.youtube.com/watch?v=Nd64jpi-fTE
- https://georgheiler.com/2023/12/01/securing-secrets-with-mozilla-sops-and-age-a-powerful-combo/
- https://georgheiler.com/2022/04/02/making-bigdata-small-again-and-green/
- https://georgheiler.com/2022/04/01/comparing-sql-based-streaming-approaches/
- https://ibrahimhkoyuncu.medium.com/dagster-complete-guide-to-deploy-multiple-data-pipelines-to-dagster-on-kubernetes-environment-b0f83a54fce2
- https://www.youtube.com/watch?v=6BPN7TnORIc
In Alphabetic order: