- automated EDA
- https://charticulator.com/
- https://www.datawrapper.de/
- https://flourish.studio/
- https://rawgraphs.io/
- high dimensional data https://github.com/facebookresearch/hiplot
- https://github.com/ibis-project/ibis
- elasticsearch: https://github.com/elastic/eland
- dask
- ray/modin
- RAPIDS
- vaex
- https://ibis-project.org/
- https://github.com/pola-rs/polars
MLops overview: https://github.com/visenger/awesome-mlops
- mlFlow
- https://docs.bentoml.org
- https://neptune.ai/blog/experiment-management
- https://neptune.ai/blog/mlops-what-it-is-why-it-matters-and-how-to-implement-it-from-a-data-scientist-perspective
- https://towardsdatascience.com/machine-learning-experiment-tracking-93b796e501b0
- https://neptune.ai/blog/machine-learning-model-management
- frameworks
- stand
- pymc
- pyro https://arviz-devs.github.io/arviz/
- https://botorch.org/
- https://arviz-devs.github.io/arviz/
- https://bayesiancomputationbook.com/welcome.html
-
books
-
a really useful link collection on my blog: https://georgheiler.com/2020/12/03/intersting-links-about-deep-learning/
-
automated/easy DL
-
tabular deeplearning:
- https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html
- https://github.com/spotify/annoy
- https://github.com/flemmerich/pysubgroup
- https://github.com/imoscovitz/wittgenstein
- https://github.com/scikit-learn-contrib/skope-rules
- exploration in a nice HTML UI by the end user to get a feel for the model or feedback: https://github.com/gradio-app/gradio
- https://www.youtube.com/watch?v=WfcCXlxkBb0, https://github.com/koaning/scikit-lego
- https://koaning.io/posts/the-future-is-past/
- https://github.com/linkedin/LiFT
- improve jupyter git integration http://timstaley.co.uk/posts/making-git-and-jupyter-notebooks-play-nice/
- https://elyra.readthedocs.io/en/latest/
- https://github.com/alibaba/alink
- https://github.com/scikit-multiflow/scikit-multiflow
- https://riverml.xyz/latest/
- feature calculation
- jupyter
- bokeh on streams in jupyter https://medium.com/swlh/streaming-time-series-with-jupyter-and-influxdb-fec33803c460
- hopsworks
- Behaviors compose better than states
- perhaps a simple python library with some transformations & DBT & Dagster will do the job very well and https://metriql.com/ for metrics
- https://feast.dev/
- https://www.youtube.com/watch?v=jo12CWZ00Lo
- https://github.com/linkedin/greykite
- https://facebook.github.io/prophet/
- https://github.com/winedarksea/AutoTS
- https://github.com/timeseriesAI/tsai
- https://github.com/AIStream-Peelout/flow-forecast
- https://khiva.readthedocs.io/en/latest/
- cleaning https://towardsdatascience.com/a-common-mistake-to-avoid-when-working-with-time-series-data-eedf60a8b4c1
basically SQL-based sync & incremental updates of data to many systems and reports without sending a lot of CSV files around
- https://github.com/whylabs/whylogs
- clean machine learning code https://leanpub.com/cleanmachinelearningcode
- code refactoring example https://github.com/moutai/tdml-example/tree/master/with_tests/tests
- cheap training using spot instances https://www.spotml.io/
entity/identity resolution
- https://github.com/SylphAI-Inc/LLM-engineer-handbook
- https://github.com/stanfordnlp/dspy
- https://github.com/SylphAI-Inc/AdalFlow
- https://github.com/Eladlev/AutoPrompt
- https://github.com/comet-ml/opik
- https://github.com/huggingface/lighteval
- https://github.com/explodinggradients/ragas
- https://github.com/unslothai/unsloth
- https://github.com/Lightning-AI/litgpt
- embeddings