Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GCP Auth #444

Merged
merged 3 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .codegen/__init__.py.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ from databricks.sdk.service.{{.Package.Name}} import {{.PascalName}}API{{end}}

{{$args := list "host" "account_id" "username" "password" "client_id" "client_secret"
"token" "profile" "config_file" "azure_workspace_resource_id" "azure_client_secret"
"azure_client_id" "azure_tenant_id" "azure_environment" "auth_type" "cluster_id"}}
"azure_client_id" "azure_tenant_id" "azure_environment" "auth_type" "cluster_id"
"google_credentials" "google_service_account" }}

{{- define "api" -}}
{{- $mixins := dict "ClustersAPI" "ClustersExt" "DbfsAPI" "DbfsExt" "WorkspaceAPI" "WorkspaceExt" -}}
Expand Down
4 changes: 4 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ psf/requests - https://github.com/psf/requests
Copyright 2019 Kenneth Reitz
License - https://github.com/psf/requests/blob/main/LICENSE

googleapis/google-auth-library-python - https://github.com/googleapis/google-auth-library-python/tree/main
Copyright google-auth-library-python authors
License - https://github.com/googleapis/google-auth-library-python/blob/main/LICENSE

This software contains code from the following open source projects, licensed under the BSD (3-clause) license.

x/oauth2 - https://cs.opensource.google/go/x/oauth2/+/master:oauth2.go
Expand Down
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,31 @@ w = WorkspaceClient(host=input('Databricks Workspace URL: '),
azure_client_secret=input('AAD Client Secret: '))
```

### Google Cloud Platform native authentication

By default, the Databricks SDK for Python first tries GCP credentials authentication (`AuthType: "google-credentials"` in `*databricks.Config`). If the SDK is unsuccessful, it then tries Google Cloud Platform (GCP) ID authentication (`AuthType: "google-id"` in `*databricks.Config`).

The Databricks SDK for Python picks up an OAuth token in the scope of the Google Default Application Credentials (DAC) flow. This means that if you have run `gcloud auth application-default login` on your development machine, or launch the application on the compute, that is allowed to impersonate the Google Cloud service account specified in `GoogleServiceAccount`. Authentication should then work out of the box. See [Creating and managing service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts).

To authenticate as a Google Cloud service account, you must provide one of the following:

- `Host` and `GoogleCredentials`; or their environment variable or `.databrickscfg` file field equivalents.
- `Host` and `GoogleServiceAccount`; or their environment variable or `.databrickscfg` file field equivalents.

| `*databricks.Config` argument | Description | Environment variable / `.databrickscfg` file field |
|-------------------------------|-------------|----------------------------------------------------|
| `GoogleCredentials`| _(String)_ GCP Service Account Credentials JSON or the location of these credentials on the local filesystem. | `GOOGLE_CREDENTIALS` / `google_credentials` |
| `GoogleServiceAccount`| _(String)_ The Google Cloud Platform (GCP) service account e-mail used for impersonation in the Default Application Credentials Flow that does not require a password. | `DATABRICKS_GOOGLE_SERVICE_ACCOUNT` / `google_service_account` |

For example, to use Google ID authentication:

```python
from databricks.sdk import WorkspaceClient
w = WorkspaceClient(host=input('Databricks Workspace URL: '),
google_service_account=input('Google Service Account: '))

```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edwardfeng-db Could you take a second pass at this doc? The field names mentioned are the Go SDK field names, not the Python SDK field names (PascalCase vs snake_case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see, sorry about this, let me update that. Some copy pasta issues

Please see more examples in [this document](./docs/azure-ad.md).

### Overriding `.databrickscfg`
Expand Down
8 changes: 8 additions & 0 deletions databricks/sdk/__init__.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

75 changes: 73 additions & 2 deletions databricks/sdk/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import configparser
import copy
import functools
import io
import json
import logging
import os
Expand All @@ -18,8 +19,11 @@
from typing import (Any, BinaryIO, Callable, Dict, Iterable, Iterator, List,
Optional, Type, Union)

import google.auth
import requests
import requests.auth
from google.auth import impersonated_credentials
from google.auth.transport.requests import Request
from google.oauth2 import service_account
from requests.adapters import HTTPAdapter

from .azure import (ARM_DATABRICKS_RESOURCE_ID, ENVIRONMENTS, AzureEnvironment,
Expand All @@ -36,6 +40,8 @@

HeaderFactory = Callable[[], Dict[str, str]]

GcpScopes = ["https://www.googleapis.com/auth/cloud-platform", "https://www.googleapis.com/auth/compute"]


class CredentialsProvider(abc.ABC):
""" CredentialsProvider is the protocol (call-side interface)
Expand Down Expand Up @@ -265,6 +271,70 @@ def refreshed_headers() -> Dict[str, str]:
return refreshed_headers


@credentials_provider('google-credentials', ['host', 'google_credentials'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the same as with the Go library. But in theory, google_credentials does not even need to be specified, because there is also a default directory that the google-auth library looks in, if I'm not mistaken. We might be able to remove this and allow users to auto-login with their google credentials set up via the default app credentials pathway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought so as well at the beginning but after checking more I think we do need to point it to the service account json file, which is different from the default auth json file as the service account one contains more info about the keys and secrets for signing the jwt etc. So it seems like we need this to make it work

def google_credentials(cfg: 'Config') -> Optional[HeaderFactory]:
if not cfg.is_gcp:
return None
# Reads credentials as JSON. Credentials can be either a path to JSON file, or actual JSON string.
# Obtain the id token by providing the json file path and target audience.
if (os.path.isfile(cfg.google_credentials)):
with io.open(cfg.google_credentials, "r", encoding="utf-8") as json_file:
account_info = json.load(json_file)
else:
# If the file doesn't exist, assume that the config is the actual JSON content.
account_info = json.loads(cfg.google_credentials)

credentials = service_account.IDTokenCredentials.from_service_account_info(info=account_info,
target_audience=cfg.host)

request = Request()

gcp_credentials = service_account.Credentials.from_service_account_info(info=account_info,
scopes=GcpScopes)

def refreshed_headers() -> Dict[str, str]:
credentials.refresh(request)
headers = {'Authorization': f'Bearer {credentials.token}'}
if cfg.is_account_client:
gcp_credentials.refresh(request)
headers["X-Databricks-GCP-SA-Access-Token"] = gcp_credentials.token
return headers

return refreshed_headers


@credentials_provider('google-id', ['host', 'google_service_account'])
def google_id(cfg: 'Config') -> Optional[HeaderFactory]:
if not cfg.is_gcp:
return None
credentials, _project_id = google.auth.default()

# Create the impersonated credential.
target_credentials = impersonated_credentials.Credentials(source_credentials=credentials,
target_principal=cfg.google_service_account,
target_scopes=[])

# Set the impersonated credential, target audience and token options.
id_creds = impersonated_credentials.IDTokenCredentials(target_credentials,
target_audience=cfg.host,
include_email=True)

gcp_impersonated_credentials = impersonated_credentials.Credentials(
source_credentials=credentials, target_principal=cfg.google_service_account, target_scopes=GcpScopes)

request = Request()

def refreshed_headers() -> Dict[str, str]:
id_creds.refresh(request)
headers = {'Authorization': f'Bearer {id_creds.token}'}
if cfg.is_account_client:
gcp_impersonated_credentials.refresh(request)
headers["X-Databricks-GCP-SA-Access-Token"] = gcp_impersonated_credentials.token
return headers

return refreshed_headers


class CliTokenSource(Refreshable):

def __init__(self, cmd: List[str], token_type_field: str, access_token_field: str, expiry_field: str):
Expand Down Expand Up @@ -531,7 +601,8 @@ def auth_type(self) -> str:
def __call__(self, cfg: 'Config') -> HeaderFactory:
auth_providers = [
pat_auth, basic_auth, metadata_service, oauth_service_principal, azure_service_principal,
github_oidc_azure, azure_cli, external_browser, databricks_cli, runtime_native_auth
github_oidc_azure, azure_cli, external_browser, databricks_cli, runtime_native_auth,
google_credentials, google_id
]
for provider in auth_providers:
auth_type = provider.auth_type()
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
version=version_data['__version__'],
packages=find_packages(exclude=["tests", "*tests.*", "*tests"]),
python_requires=">=3.7",
install_requires=["requests>=2.28.1,<3"],
install_requires=["requests>=2.28.1,<3", "google-auth~=2.0"],
extras_require={"dev": ["pytest", "pytest-cov", "pytest-xdist", "pytest-mock",
"yapf", "pycodestyle", "autoflake", "isort", "wheel",
"ipython", "ipywidgets", "requests-mock", "pyfakefs"],
Expand Down
3 changes: 0 additions & 3 deletions tests/integration/test_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,9 +222,6 @@ def test_files_api_upload_download(ucws, random):
target_file = f'/Volumes/main/{schema}/{volume}/filesit-{random()}.txt'
w.files.upload(target_file, f)

res = w.files.get_status(target_file)
assert not res.is_dir

with w.files.download(target_file).contents as f:
assert f.read() == b"some text data"

Expand Down
Loading