Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add client credential oauth integration support + related databricks helpers to SDK #348

Merged
merged 22 commits into from
Dec 6, 2024

Conversation

zackverham
Copy link
Collaborator

@zackverham zackverham commented Dec 2, 2024

This PR adds a new function to the oauth tooling. It knows how to craft a request against the credential exchange endpoint that can interface with client credential flows. This supports service-account-like oauth interactions.

Connect sets an environment variable called CONNECT_CONTENT_SESSION_TOKEN which can be used by the executing content when making the request. This means that the syntax for using this new credential exchange function is slightly different.

This PR also adds the databricks extensions needed to use this new oauth exchange flow when interfacing with databricks resources.

This feature won't land in Connect until the upcoming release gets published - I don't know how we typically queue up work like this in sync with product releases.

@zackverham zackverham requested a review from tdstein as a code owner December 2, 2024 20:54
Copy link

github-actions bot commented Dec 2, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
1950 1808 93% 0% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
src/posit/connect/content.py 96% 🟢
src/posit/connect/external/databricks.py 91% 🟢
src/posit/connect/external/snowflake.py 91% 🟢
src/posit/connect/oauth/_init_.py 100% 🟢
src/posit/connect/oauth/oauth.py 100% 🟢
TOTAL 96% 🟢

updated for commit: 59862c4 by action🐍

@zackverham zackverham self-assigned this Dec 2, 2024
@zackverham zackverham changed the title Add client credential oauth integration support + related databricks helpers to SDK feat: Add client credential oauth integration support + related databricks helpers to SDK Dec 2, 2024
@zackverham
Copy link
Collaborator Author

zackverham commented Dec 2, 2024

Here's a code snippet that illustrates the ergonomics of using this code in published content (this is pulled from a Quarto doc I have running locally):

from posit.connect.external.databricks import PositContentCredentialsStrategy

import requests

import pandas as pd
from databricks import sql
from databricks.sdk.core import ApiClient, Config, databricks_cli
from databricks.sdk.service.iam import CurrentUserAPI

# env vars
DATABRICKS_HOST = "<REDACTED>"
DATABRICKS_HOST_URL = f"https://{DATABRICKS_HOST}"
SQL_HTTP_PATH = "<REDACTED>"

posit_strategy = PositContentCredentialsStrategy(local_strategy=databricks_cli)

cfg = Config(host=DATABRICKS_HOST_URL, credentials_strategy=posit_strategy)

databricks_user_info = CurrentUserAPI(ApiClient(cfg)).me()
print(f"Hello, {databricks_user_info.display_name}!")

query = "SELECT * FROM samples.nyctaxi.trips LIMIT 10;"
with sql.connect(
    server_hostname=DATABRICKS_HOST,
    http_path=SQL_HTTP_PATH,
    credentials_provider=posit_strategy.sql_credentials_provider(cfg),
) as connection:
    with connection.cursor() as cursor:
        cursor.execute(query)
        rows = cursor.fetchall()
        print(pd.DataFrame([row.asDict() for row in rows]))

src/posit/connect/external/databricks.py Outdated Show resolved Hide resolved
src/posit/connect/external/databricks.py Outdated Show resolved Hide resolved
src/posit/connect/external/databricks.py Outdated Show resolved Hide resolved
src/posit/connect/external/databricks.py Outdated Show resolved Hide resolved
@mconflitti-pbc
Copy link
Collaborator

Odd ergonomics with the passing around of that config object! But read through the convo on the databricks repo and it seems to be the best option for now! LGTM

DATABRICKS_HOST_URL = f"https://{DATABRICKS_HOST}"
SQL_HTTP_PATH = "<REDACTED>"

posit_strategy = PositContentCredentialsStrategy(local_strategy=databricks_cli)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other credentialsStrategy docstring uses a sample py-shiny app so that the user can (hopefully) see that the location that the initialization happens is really important. That context is lost in this example. I think it would be worthwhile to show both and call out explicitly why the initialization is different for service accounts than for viewer auth

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I guess the question ultimately is how much of this documentation should exist in the docstrings vs in our docs pages (both the cookbook examples themselves, and prose discussions explaining things).

I think this might be a question for @schloerke - I don't know exactly what expectations would be for SDK consumers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think docs strings should show usage examples.

I think the current Connect recipes are turning into small usage examples and should be moved to the doc strings. (And the recipes should be larger/longer goal-oriented stories combining many small usage examples.)


tl/dr; More usage examples in doc strings, please!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example needs to be updated to use the PositLocalContentCredentialsStrategy

import requests

from databricks import sql
from databricks.sdk.core import ApiClient, Config, databricks_cli
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zackverham did you ever figure out a workaround for getting local service accounts tokens via the cli fallback?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not. Databricks labeled my ticket in the CLI repo as a bug, but that's where I'm stuck currently:

databricks/cli#1939

I'm not totally sure what we want to do with that. Options seem to be:

  1. remove the local option from our implementation and re-introduce it when databricks supports it
  2. leave the option to provide a local strategy + code comments explaining the databricks cli doesn't support it. A particularly motivated user could probably fork the databricks cli and implement a token command that follows the pattern established in other cli operations to get it to work. This isn't super likely and obviously pretty rough, but the local strategy here doesn't explicitly state it has to be provided by an official vendor.

@schloerke
Copy link
Collaborator

Happy to have more usage examples if you'd like. Either way, feel free to merge when ready.

# HTTP session headers are available in this context.
session_token = session.http_conn.headers.get("Posit-Connect-User-Session-Token")
posit_strategy = PositCredentialsStrategy(
local_strategy=databricks_cli, user_session_token=session_token
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this now be the patched local strategy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its sort of ambiguous imo. the user could still use the databricks_cli as the local strategy, if say for example their databricks admin doesn't want to hand out service principal credentials. They can still use their own identity for local development with the databricks cli - that was my thinking for keeping this as-is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments to the docstring to clarify the various use cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I myself got confused - to clarify, this strategy is supporting viewer auth, which has no issues. The local strategy here works fine. Its just the content strategies that need the patch. (I still added docstring updates in the right places)


See Also
--------
* https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html#manually-generate-a-workspace-level-access-token
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - this is somewhat confusing, because the development of their CLI/SDK tooling and the sql connector are happening on independent threads and don't always sync up.

Here's the credentials provider we care about - it should fall back to whatever the CLI does to acquire a token:
https://github.com/databricks/databricks-sdk-py/blob/980166030c688e08a52d0f55389abd766f1d928d/databricks/sdk/credentials_provider.py#L630

Which calls this:
https://github.com/databricks/databricks-sdk-py/blob/980166030c688e08a52d0f55389abd766f1d928d/databricks/sdk/credentials_provider.py#L578

Which executes the databricks auth token command to acquire a token.

The real crux of the problem is in the auth token command - their token command isn't written to support anything aside from auth code flows:

https://github.com/databricks/cli/blob/main/cmd/auth/token.go#L45

So you can test this even outside of our code - just running databricks auth token with an oauth-m2m profile doesn't work. It ends up returning your own token and ignoring the profile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying!

Comment on lines 195 to 199
local_strategy = PositLocalContentCredentialsStrategy(
token_endpoint_url=TOKEN_ENDPOINT_URL,
client_id=CLIENT_ID,
client_secret=CLIENT_SECRET,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it be on the user to make sure they have the proper env vars wired up? Curious if it makes sense to pull the values from their .databrickscfg file for them?

https://docs.databricks.com/en/dev-tools/cli/authentication.html#oauth-machine-to-machine-m2m-authentication

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just a nice to have for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a good thought, for sure - I'm hesitant to add that, since (a) this is really just supposed to be a band-aid to get people unstuck until the databricks CLI is working again, and (b) we'd then have to implement a parser that understands the databrickscfg file (profile settings, etc).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats fair!

@zackverham zackverham merged commit 6b82ba3 into main Dec 6, 2024
35 checks passed
@zackverham zackverham deleted the zack-client-creds branch December 6, 2024 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants