Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cookbook improvement] Working PR #31

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 0 additions & 131 deletions agent_app_sample_code/00_global_config.py

This file was deleted.

95 changes: 95 additions & 0 deletions agent_app_sample_code/00_shared_config.py
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer "lazy" config where validation happens when the config actually is used vs needing to explicitly run this notebook

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the tradeoff/feedback I think the current logic is trying to address is failing fast and surface all permissions issues/blockers early on, since folks found it confusing to tab back and forth between notebooks to make fixes

Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Databricks notebook source
# MAGIC %md
# MAGIC ## Shared configuration
# MAGIC
# MAGIC This notebook initializes a `AgentCookbookConfig` Pydantic class to define parameters that are shared between the cookbook notebooks:
# MAGIC - Unity Catalog schema
# MAGIC - MLflow Experiment to track versions of the Agent and their associated quality/cost/latency evaluation results
# MAGIC - Unity Catalog model that stores versions of the Agent's code/config that are deployed
# MAGIC - Evaluation Set Delta Table
# MAGIC
# MAGIC This notebook does the following:
# MAGIC 1. Creates the UC catalog/schema if they don't exist
# MAGIC 2. Serializes the configuration to `config/cookbook_config.yaml` so other notebooks can load it
# MAGIC
# MAGIC **Important: We suggest starting from a fresh clone of the cookbook if you need to change this configuration.**

# COMMAND ----------

# MAGIC %md
# MAGIC **Important note:** Throughout this notebook, we indicate which cells you:
# MAGIC - ✅✏️ *should* customize - these cells contain config settings to change
# MAGIC - 🚫✏️ *typically will not* customize - these cells contain boilerplate code required to validate / save the configuration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the things that shouldn't be customized in python files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move more into the python files, but i tried to strike a balance of not hiding code that the user should be aware of. for example, i made a cell with mlflow logging boilerplate that shouldn't be modified but i also didnt want to hide it from the user since we got feedback before that hiding the details of how model logging works was confusing

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack - probably out of scope for this PR but I'd prefer some better logging, output rendering and maybe the ability to change the config inline using an input Ipywidget or similar. Fewer notebook cells, less to think about

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda agree with @FMurray here, as a reader these notes are quite helpful but they seem hard for us to maintain. I wonder if we could recommend folks to use IDEs to view the cookbook (that's what I'm doing right now as I review & try out changes to this PR), which has better jump-to-definition support for things like easily reading the code in Python modules, if readers are interested - that way we can try to keep maintainability but avoid the "hidden magic code" issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH though, there are other things that I think affect maintainability more, so fine to try this to start, it is quite helpful

# MAGIC
# MAGIC *Cells that don't require customization still need to be run!*

# COMMAND ----------

# MAGIC %md
# MAGIC ### 🚫✏️ Install Python libraries

# COMMAND ----------

# MAGIC %pip install -qqqq -U -r requirements.txt
# MAGIC dbutils.library.restartPython()

# COMMAND ----------

# MAGIC %md
# MAGIC ### 🚫✏️ Get user info to set default values

# COMMAND ----------

from databricks.sdk import WorkspaceClient

# Get current user's name & email
w = WorkspaceClient()
user_email = w.current_user.me().user_name
user_name = user_email.split("@")[0].replace(".", "_")

# Get the workspace default UC catalog
default_catalog = spark.sql("select current_catalog() as cur_catalog").collect()[0]['cur_catalog']

# COMMAND ----------

# MAGIC %md
# MAGIC ### ✅✏️ Configure this instance of the cookbook

# COMMAND ----------

from cookbook_utils.cookbook_config import AgentCookbookConfig

shared_config = AgentCookbookConfig(
uc_catalog_name=f"{default_catalog}",
uc_schema_name=f"{user_name}_agents",
uc_asset_prefix="agent_app_name", # Prefix to every created UC asset, typically the Agent's name
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear how to scale this to multiple models - IMO schema should be the “agent app name” (where agent apps contain one or more models/data sources)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would simplify the confusing idea of us_asset_prefix...and avoid long names for each asset

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the common pattern you see @FMurray - one agent == one schema?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By agent app - I mean one use case, which might have several agent models, like for example 3 distinct RAG models and an supervisor model.

I see customers who have less permissive UC setups typically apply grants on the schema (occasionally on catalog), so I wouldn't want to have one schema per model, because that would slow them down.

)

# COMMAND ----------

# MAGIC %md
# MAGIC ### 🚫✏️ Save the configuration for use by other notebooks

# COMMAND ----------

# Save configuration
shared_config.dump_to_yaml('./configs/cookbook_config.yaml')

# Print configuration
shared_config.pretty_print()

# COMMAND ----------

# MAGIC %md
# MAGIC ### 🚫✏️ Validate the storage locations exist, create if they don't exist

# COMMAND ----------

if not shared_config.validate_or_create_uc_catalog():
raise Exception("UC Catalog is not valid, fix per the console notes above.")

if not shared_config.validate_or_create_uc_schema():
raise Exception("UC Schema is not valid, fix per the console notes above.")

if not shared_config.validate_or_create_mlflow_experiment():
raise Exception("MLflow experiment name is not valid, fix per the console notes above.")
Loading