-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add client credential oauth integration support + related databricks helpers to SDK #348
Conversation
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
Here's a code snippet that illustrates the ergonomics of using this code in published content (this is pulled from a Quarto doc I have running locally): from posit.connect.external.databricks import PositContentCredentialsStrategy
import requests
import pandas as pd
from databricks import sql
from databricks.sdk.core import ApiClient, Config, databricks_cli
from databricks.sdk.service.iam import CurrentUserAPI
# env vars
DATABRICKS_HOST = "<REDACTED>"
DATABRICKS_HOST_URL = f"https://{DATABRICKS_HOST}"
SQL_HTTP_PATH = "<REDACTED>"
posit_strategy = PositContentCredentialsStrategy(local_strategy=databricks_cli)
cfg = Config(host=DATABRICKS_HOST_URL, credentials_strategy=posit_strategy)
databricks_user_info = CurrentUserAPI(ApiClient(cfg)).me()
print(f"Hello, {databricks_user_info.display_name}!")
query = "SELECT * FROM samples.nyctaxi.trips LIMIT 10;"
with sql.connect(
server_hostname=DATABRICKS_HOST,
http_path=SQL_HTTP_PATH,
credentials_provider=posit_strategy.sql_credentials_provider(cfg),
) as connection:
with connection.cursor() as cursor:
cursor.execute(query)
rows = cursor.fetchall()
print(pd.DataFrame([row.asDict() for row in rows])) |
Odd ergonomics with the passing around of that config object! But read through the convo on the databricks repo and it seems to be the best option for now! LGTM |
Co-authored-by: Barret Schloerke <[email protected]>
Co-authored-by: Barret Schloerke <[email protected]>
DATABRICKS_HOST_URL = f"https://{DATABRICKS_HOST}" | ||
SQL_HTTP_PATH = "<REDACTED>" | ||
|
||
posit_strategy = PositContentCredentialsStrategy(local_strategy=databricks_cli) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other credentialsStrategy docstring uses a sample py-shiny app so that the user can (hopefully) see that the location that the initialization happens is really important. That context is lost in this example. I think it would be worthwhile to show both and call out explicitly why the initialization is different for service accounts than for viewer auth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I guess the question ultimately is how much of this documentation should exist in the docstrings vs in our docs pages (both the cookbook examples themselves, and prose discussions explaining things).
I think this might be a question for @schloerke - I don't know exactly what expectations would be for SDK consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think docs strings should show usage examples.
I think the current Connect recipes are turning into small usage examples and should be moved to the doc strings. (And the recipes should be larger/longer goal-oriented stories combining many small usage examples.)
tl/dr; More usage examples in doc strings, please!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example needs to be updated to use the PositLocalContentCredentialsStrategy
import requests | ||
|
||
from databricks import sql | ||
from databricks.sdk.core import ApiClient, Config, databricks_cli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zackverham did you ever figure out a workaround for getting local service accounts tokens via the cli fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not. Databricks labeled my ticket in the CLI repo as a bug, but that's where I'm stuck currently:
I'm not totally sure what we want to do with that. Options seem to be:
- remove the local option from our implementation and re-introduce it when databricks supports it
- leave the option to provide a local strategy + code comments explaining the databricks cli doesn't support it. A particularly motivated user could probably fork the databricks cli and implement a token command that follows the pattern established in other cli operations to get it to work. This isn't super likely and obviously pretty rough, but the local strategy here doesn't explicitly state it has to be provided by an official vendor.
Happy to have more usage examples if you'd like. Either way, feel free to merge when ready. |
# HTTP session headers are available in this context. | ||
session_token = session.http_conn.headers.get("Posit-Connect-User-Session-Token") | ||
posit_strategy = PositCredentialsStrategy( | ||
local_strategy=databricks_cli, user_session_token=session_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this now be the patched local strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its sort of ambiguous imo. the user could still use the databricks_cli as the local strategy, if say for example their databricks admin doesn't want to hand out service principal credentials. They can still use their own identity for local development with the databricks cli - that was my thinking for keeping this as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments to the docstring to clarify the various use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I myself got confused - to clarify, this strategy is supporting viewer auth, which has no issues. The local strategy here works fine. Its just the content strategies that need the patch. (I still added docstring updates in the right places)
|
||
See Also | ||
-------- | ||
* https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html#manually-generate-a-workspace-level-access-token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing this example as well: https://docs.databricks.com/en/dev-tools/python-sql-connector.html#oauth-machine-to-machine-m2m-authentication
Is the problem using their version of this that they have omitted client_credentials
scope?
https://github.com/databricks/databricks-sdk-py/blob/980166030c688e08a52d0f55389abd766f1d928d/databricks/sdk/credentials_provider.py#L164-L174
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - this is somewhat confusing, because the development of their CLI/SDK tooling and the sql connector are happening on independent threads and don't always sync up.
Here's the credentials provider we care about - it should fall back to whatever the CLI does to acquire a token:
https://github.com/databricks/databricks-sdk-py/blob/980166030c688e08a52d0f55389abd766f1d928d/databricks/sdk/credentials_provider.py#L630
Which calls this:
https://github.com/databricks/databricks-sdk-py/blob/980166030c688e08a52d0f55389abd766f1d928d/databricks/sdk/credentials_provider.py#L578
Which executes the databricks auth token
command to acquire a token.
The real crux of the problem is in the auth token
command - their token command isn't written to support anything aside from auth code flows:
https://github.com/databricks/cli/blob/main/cmd/auth/token.go#L45
So you can test this even outside of our code - just running databricks auth token
with an oauth-m2m profile doesn't work. It ends up returning your own token and ignoring the profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying!
local_strategy = PositLocalContentCredentialsStrategy( | ||
token_endpoint_url=TOKEN_ENDPOINT_URL, | ||
client_id=CLIENT_ID, | ||
client_secret=CLIENT_SECRET, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it be on the user to make sure they have the proper env vars wired up? Curious if it makes sense to pull the values from their .databrickscfg file for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just a nice to have for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its a good thought, for sure - I'm hesitant to add that, since (a) this is really just supposed to be a band-aid to get people unstuck until the databricks CLI is working again, and (b) we'd then have to implement a parser that understands the databrickscfg file (profile settings, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats fair!
This PR adds a new function to the oauth tooling. It knows how to craft a request against the credential exchange endpoint that can interface with client credential flows. This supports service-account-like oauth interactions.
Connect sets an environment variable called
CONNECT_CONTENT_SESSION_TOKEN
which can be used by the executing content when making the request. This means that the syntax for using this new credential exchange function is slightly different.This PR also adds the databricks extensions needed to use this new oauth exchange flow when interfacing with databricks resources.
This feature won't land in Connect until the upcoming release gets published - I don't know how we typically queue up work like this in sync with product releases.