-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cookbook improvement] Working PR #31
base: main
Are you sure you want to change the base?
Changes from 19 commits
e2318fa
95de0a0
eb0dea0
eacc990
2be2631
aa4d641
58973ec
ade5c26
946cbc8
720ab5f
51d2965
02fead4
618a35c
972d758
80520d2
02905fb
fb06252
c3424e3
72930bc
099082a
dc05d52
f7ec5b1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
# Databricks notebook source | ||
# MAGIC %md | ||
# MAGIC ## Shared configuration | ||
# MAGIC | ||
# MAGIC This notebook initializes a `AgentCookbookConfig` Pydantic class to define parameters that are shared between the cookbook notebooks: | ||
# MAGIC - Unity Catalog schema | ||
# MAGIC - MLflow Experiment to track versions of the Agent and their associated quality/cost/latency evaluation results | ||
# MAGIC - Unity Catalog model that stores versions of the Agent's code/config that are deployed | ||
# MAGIC - Evaluation Set Delta Table | ||
# MAGIC | ||
# MAGIC This notebook does the following: | ||
# MAGIC 1. Creates the UC catalog/schema if they don't exist | ||
# MAGIC 2. Serializes the configuration to `config/cookbook_config.yaml` so other notebooks can load it | ||
# MAGIC | ||
# MAGIC **Important: We suggest starting from a fresh clone of the cookbook if you need to change this configuration.** | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC **Important note:** Throughout this notebook, we indicate which cells you: | ||
# MAGIC - ✅✏️ *should* customize - these cells contain config settings to change | ||
# MAGIC - 🚫✏️ *typically will not* customize - these cells contain boilerplate code required to validate / save the configuration | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not put the things that shouldn't be customized in python files? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can move more into the python files, but i tried to strike a balance of not hiding code that the user should be aware of. for example, i made a cell with mlflow logging boilerplate that shouldn't be modified but i also didnt want to hide it from the user since we got feedback before that hiding the details of how model logging works was confusing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ack - probably out of scope for this PR but I'd prefer some better logging, output rendering and maybe the ability to change the config inline using an input Ipywidget or similar. Fewer notebook cells, less to think about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I kinda agree with @FMurray here, as a reader these notes are quite helpful but they seem hard for us to maintain. I wonder if we could recommend folks to use IDEs to view the cookbook (that's what I'm doing right now as I review & try out changes to this PR), which has better jump-to-definition support for things like easily reading the code in Python modules, if readers are interested - that way we can try to keep maintainability but avoid the "hidden magic code" issue There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TBH though, there are other things that I think affect maintainability more, so fine to try this to start, it is quite helpful |
||
# MAGIC | ||
# MAGIC *Cells that don't require customization still need to be run!* | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC ### 🚫✏️ Install Python libraries | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %pip install -qqqq -U -r requirements.txt | ||
# MAGIC dbutils.library.restartPython() | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC ### 🚫✏️ Get user info to set default values | ||
|
||
# COMMAND ---------- | ||
|
||
from databricks.sdk import WorkspaceClient | ||
|
||
# Get current user's name & email | ||
w = WorkspaceClient() | ||
user_email = w.current_user.me().user_name | ||
user_name = user_email.split("@")[0].replace(".", "_") | ||
|
||
# Get the workspace default UC catalog | ||
default_catalog = spark.sql("select current_catalog() as cur_catalog").collect()[0]['cur_catalog'] | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC ### ✅✏️ Configure this instance of the cookbook | ||
|
||
# COMMAND ---------- | ||
|
||
from cookbook_utils.cookbook_config import AgentCookbookConfig | ||
|
||
shared_config = AgentCookbookConfig( | ||
uc_catalog_name=f"{default_catalog}", | ||
uc_schema_name=f"{user_name}_agents", | ||
uc_asset_prefix="agent_app_name", # Prefix to every created UC asset, typically the Agent's name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unclear how to scale this to multiple models - IMO schema should be the “agent app name” (where agent apps contain one or more models/data sources) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would simplify the confusing idea of us_asset_prefix...and avoid long names for each asset There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that the common pattern you see @FMurray - one agent == one schema? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By agent app - I mean one use case, which might have several agent models, like for example 3 distinct RAG models and an supervisor model. I see customers who have less permissive UC setups typically apply grants on the schema (occasionally on catalog), so I wouldn't want to have one schema per model, because that would slow them down. |
||
) | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC ### 🚫✏️ Save the configuration for use by other notebooks | ||
|
||
# COMMAND ---------- | ||
|
||
# Save configuration | ||
shared_config.dump_to_yaml('./configs/cookbook_config.yaml') | ||
|
||
# Print configuration | ||
shared_config.pretty_print() | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC ### 🚫✏️ Validate the storage locations exist, create if they don't exist | ||
|
||
# COMMAND ---------- | ||
|
||
if not shared_config.validate_or_create_uc_catalog(): | ||
raise Exception("UC Catalog is not valid, fix per the console notes above.") | ||
|
||
if not shared_config.validate_or_create_uc_schema(): | ||
raise Exception("UC Schema is not valid, fix per the console notes above.") | ||
|
||
if not shared_config.validate_or_create_mlflow_experiment(): | ||
raise Exception("MLflow experiment name is not valid, fix per the console notes above.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer "lazy" config where validation happens when the config actually is used vs needing to explicitly run this notebook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the tradeoff/feedback I think the current logic is trying to address is failing fast and surface all permissions issues/blockers early on, since folks found it confusing to tab back and forth between notebooks to make fixes