Skip to content

Commit

Permalink
Support partners in SDK (#648)
Browse files Browse the repository at this point in the history
## Changes
Ports databricks/databricks-sdk-go#925 to the
Python SDK.

Partners of Databricks need a mechanism to register themselves in
libraries or applications that they write. In this way, requests made by
users of those libraries will include sufficient information to link
those requests to the original users.

This PR adds a new `useragent` module with functions to manipulate the
user agent.
* `product()`: returns the globally configured product & version.
* `with_product(product: str, product_version: str)`: configure the
global product & version.
* `extra()`: returns the globally configured extra user agent metadata.
* `with_extra(key: str, value: str)`: add an extra entry to the global
extra user agent metadata.
* `with_partner(partner: str)`: add a partner to the global extra user
agent metadata.
* `to_string(product_override: Optional[Tuple[str, str]]=None,
other_info: Optional[List[Tuple[str, str]]] = None): str`: return the
User-Agent header as a string.


One new function here is `with_partner`, which can be used by a partner
to add partner information to the User-Agent header for requests made by
the SDK. The new header will have the form `partner/<parther id>`. The
partner identifier is opaque for the SDK, but it must be alphanumeric.

This PR also removes the requirement that a user agent entry contain
only a single copy of each key. This allows multiple partners to
register in the same library or application.

In this PR, I've also refactored the user agent library to be more
static, aligning it with the Go and Java SDKs. This makes it easier to
maintain and ensures similar behavior between all 3 SDKs. Note that this
SDK has extra functionality that doesn't exist in the Go and Java SDKs,
namely config-level user agent info; that is preserved here.

## Tests
Unit tests were added to verify that the user agent contains all
expected parts and supports multiple partners.

- [ ] `make test` run locally
- [ ] `make fmt` applied
- [ ] relevant integration tests applied
  • Loading branch information
mgyucht authored Jul 4, 2024
1 parent 9931e68 commit 7a49922
Show file tree
Hide file tree
Showing 6 changed files with 262 additions and 79 deletions.
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ The SDK's internal HTTP client is robust and handles failures on different level
- [Long-running operations](#long-running-operations)
- [Paginated responses](#paginated-responses)
- [Single-sign-on with OAuth](#single-sign-on-sso-with-oauth)
- [User Agent Request Attribution](#user-agent-request-attribution)
- [Error handling](#error-handling)
- [Logging](#logging)
- [Integration with `dbutils`](#interaction-with-dbutils)
Expand Down Expand Up @@ -508,6 +509,29 @@ logging.info(f'Created new custom app: '
f'--client_secret {custom_app.client_secret}')
```

## User Agent Request Attribution<a id="user-agent-request-attribution"></a>

The Databricks SDK for Python uses the `User-Agent` header to include request metadata along with each request. By default, this includes the version of the Python SDK, the version of the Python language used by your application, and the underlying operating system. To statically add additional metadata, you can use the `with_partner()` and `with_product()` functions in the `databricks.sdk.useragent` module. `with_partner()` can be used by partners to indicate that code using the Databricks SDK for Go should be attributed to a specific partner. Multiple partners can be registered at once. Partner names can contain any number, digit, `.`, `-`, `_` or `+`.

```python
from databricks.sdk import useragent
useragent.with_product("partner-abc")
useragent.with_partner("partner-xyz")
```

`with_product()` can be used to define the name and version of the product that is built with the Databricks SDK for Python. The product name has the same restrictions as the partner name above, and the product version must be a valid [SemVer](https://semver.org/). Subsequent calls to `with_product()` replace the original product with the new user-specified one.

```go
from databricks.sdk import useragent
useragent.with_product("databricks-example-product", "1.2.0")
```

If both the `DATABRICKS_SDK_UPSTREAM` and `DATABRICKS_SDK_UPSTREAM_VERSION` environment variables are defined, these will also be included in the `User-Agent` header.

If additional metadata needs to be specified that isn't already supported by the above interfaces, you can use the `with_user_agent_extra()` function to register arbitrary key-value pairs to include in the user agent. Multiple values associated with the same key are allowed. Keys have the same restrictions as the partner name above. Values must be either as described above or SemVer strings.

Additional `User-Agent` information can be associated with different instances of `DatabricksConfig`. To add metadata to a specific instance of `DatabricksConfig`, use the `with_user_agent_extra()` method.

## Error handling<a id="error-handling"></a>

The Databricks SDK for Python provides a robust error-handling mechanism that allows developers to catch and handle API errors. When an error occurs, the SDK will raise an exception that contains information about the error, such as the HTTP status code, error message, and error details. Developers can catch these exceptions and handle them appropriately in their code.
Expand Down
90 changes: 21 additions & 69 deletions databricks/sdk/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,18 @@
import logging
import os
import pathlib
import platform
import sys
import urllib.parse
from typing import Dict, Iterable, List, Optional, Tuple
from typing import Dict, Iterable, Optional

import requests

from . import useragent
from .clock import Clock, RealClock
from .credentials_provider import CredentialsStrategy, DefaultCredentials
from .environments import (ALL_ENVS, AzureEnvironment, Cloud,
DatabricksEnvironment, get_environment_for_hostname)
from .oauth import OidcEndpoints, Token
from .version import __version__

logger = logging.getLogger('databricks.sdk')

Expand Down Expand Up @@ -44,30 +43,14 @@ def __repr__(self) -> str:
return f"<ConfigAttribute '{self.name}' {self.transform.__name__}>"


_DEFAULT_PRODUCT_NAME = 'unknown'
_DEFAULT_PRODUCT_VERSION = '0.0.0'
_STATIC_USER_AGENT: Tuple[str, str, List[str]] = (_DEFAULT_PRODUCT_NAME, _DEFAULT_PRODUCT_VERSION, [])


def with_product(product: str, product_version: str):
"""[INTERNAL API] Change the product name and version used in the User-Agent header."""
global _STATIC_USER_AGENT
prev_product, prev_version, prev_other_info = _STATIC_USER_AGENT
logger.debug(f'Changing product from {prev_product}/{prev_version} to {product}/{product_version}')
_STATIC_USER_AGENT = product, product_version, prev_other_info
useragent.with_product(product, product_version)


def with_user_agent_extra(key: str, value: str):
"""[INTERNAL API] Add extra metadata to the User-Agent header when developing a library."""
global _STATIC_USER_AGENT
product_name, product_version, other_info = _STATIC_USER_AGENT
for item in other_info:
if item.startswith(f"{key}/"):
# ensure that we don't have duplicates
other_info.remove(item)
break
other_info.append(f"{key}/{value}")
_STATIC_USER_AGENT = product_name, product_version, other_info
useragent.with_extra(key, value)


class Config:
Expand Down Expand Up @@ -111,21 +94,12 @@ def __init__(self,
# Deprecated. Use credentials_strategy instead.
credentials_provider: CredentialsStrategy = None,
credentials_strategy: CredentialsStrategy = None,
product=_DEFAULT_PRODUCT_NAME,
product_version=_DEFAULT_PRODUCT_VERSION,
product=None,
product_version=None,
clock: Clock = None,
**kwargs):
self._header_factory = None
self._inner = {}
# as in SDK for Go, pull information from global static user agent context,
# so that we can track additional metadata for mid-stream libraries, as well
# as for cases, when the downstream product is used as a library and is not
# configured with a proper product name and version.
static_product, static_version, _ = _STATIC_USER_AGENT
if product == _DEFAULT_PRODUCT_NAME:
product = static_product
if product_version == _DEFAULT_PRODUCT_VERSION:
product_version = static_version
self._user_agent_other_info = []
if credentials_strategy and credentials_provider:
raise ValueError(
Expand All @@ -147,8 +121,7 @@ def __init__(self,
self._fix_host_if_needed()
self._validate()
self.init_auth()
self._product = product
self._product_version = product_version
self._init_product(product, product_version)
except ValueError as e:
message = self.wrap_debug_info(str(e))
raise ValueError(message) from e
Expand Down Expand Up @@ -260,47 +233,19 @@ def is_any_auth_configured(self) -> bool:
@property
def user_agent(self):
""" Returns User-Agent header used by this SDK """
py_version = platform.python_version()
os_name = platform.uname().system.lower()

ua = [
f"{self._product}/{self._product_version}", f"databricks-sdk-py/{__version__}",
f"python/{py_version}", f"os/{os_name}", f"auth/{self.auth_type}",
]
if len(self._user_agent_other_info) > 0:
ua.append(' '.join(self._user_agent_other_info))
# as in SDK for Go, pull information from global static user agent context,
# so that we can track additional metadata for mid-stream libraries. this value
# is shared across all instances of Config objects intentionally.
_, _, static_info = _STATIC_USER_AGENT
if len(static_info) > 0:
ua.append(' '.join(static_info))
if len(self._upstream_user_agent) > 0:
ua.append(self._upstream_user_agent)
if 'DATABRICKS_RUNTIME_VERSION' in os.environ:
runtime_version = os.environ['DATABRICKS_RUNTIME_VERSION']
if runtime_version != '':
runtime_version = self._sanitize_header_value(runtime_version)
ua.append(f'runtime/{runtime_version}')

return ' '.join(ua)

@staticmethod
def _sanitize_header_value(value: str) -> str:
value = value.replace(' ', '-')
value = value.replace('/', '-')
return value
# global user agent includes SDK version, product name & version, platform info,
# and global extra info. Config can have specific extra info associated with it,
# such as an override product, auth type, and other user-defined information.
return useragent.to_string(self._product_info,
[("auth", self.auth_type)] + self._user_agent_other_info)

@property
def _upstream_user_agent(self) -> str:
product = os.environ.get('DATABRICKS_SDK_UPSTREAM', None)
product_version = os.environ.get('DATABRICKS_SDK_UPSTREAM_VERSION', None)
if product is not None and product_version is not None:
return f"upstream/{product} upstream-version/{product_version}"
return ""
return " ".join(f"{k}/{v}" for k, v in useragent._get_upstream_user_agent_info())

def with_user_agent_extra(self, key: str, value: str) -> 'Config':
self._user_agent_other_info.append(f"{key}/{value}")
self._user_agent_other_info.append((key, value))
return self

@property
Expand Down Expand Up @@ -505,6 +450,13 @@ def init_auth(self):
except ValueError as e:
raise ValueError(f'{self._credentials_strategy.auth_type()} auth: {e}') from e

def _init_product(self, product, product_version):
if product is not None or product_version is not None:
default_product, default_version = useragent.product()
self._product_info = (product or default_product, product_version or default_version)
else:
self._product_info = None

def __repr__(self):
return f'<{self.debug_string()}>'

Expand Down
144 changes: 144 additions & 0 deletions databricks/sdk/useragent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
import copy
import logging
import os
import platform
import re
from typing import List, Optional, Tuple

from .version import __version__

# Constants
RUNTIME_KEY = 'runtime'
CICD_KEY = 'cicd'
AUTH_KEY = 'auth'

_product_name = "unknown"
_product_version = "0.0.0"

logger = logging.getLogger("databricks.sdk.useragent")

_extra = []

# Precompiled regex patterns
alphanum_pattern = re.compile(r'^[a-zA-Z0-9_.+-]+$')
semver_pattern = re.compile(r'^v?(\d+\.)?(\d+\.)?(\*|\d+)$')


def _match_alphanum(value):
if not alphanum_pattern.match(value):
raise ValueError(f"Invalid alphanumeric value: {value}")


def _match_semver(value):
if not semver_pattern.match(value):
raise ValueError(f"Invalid semantic version: {value}")


def _match_alphanum_or_semver(value):
if not alphanum_pattern.match(value) and not semver_pattern.match(value):
raise ValueError(f"Invalid value: {value}")


def product() -> Tuple[str, str]:
"""Return the global product name and version that will be submitted to Databricks on every request."""
return _product_name, _product_version


def with_product(name: str, version: str):
"""Change the product name and version that will be submitted to Databricks on every request."""
global _product_name, _product_version
_match_alphanum(name)
_match_semver(version)
logger.debug(f'Changing product from {_product_name}/{_product_version} to {name}/{version}')
_product_name = name
_product_version = version


def _reset_product():
"""[Internal API] Reset product name and version to the default values.
Used for testing purposes only."""
global _product_name, _product_version
_product_name = "unknown"
_product_version = "0.0.0"


def with_extra(key: str, value: str):
"""Add extra metadata to all requests submitted to Databricks.
User-specified extra metadata can be inserted into request headers to provide additional context to Databricks
about usage of different tools in the Databricks ecosystem. This can be useful for collecting telemetry about SDK
usage from tools that are built on top of the SDK.
"""
global _extra
_match_alphanum(key)
_match_alphanum_or_semver(value)
logger.debug(f'Adding {key}/{value} to User-Agent')
_extra.append((key, value))


def extra() -> List[Tuple[str, str]]:
"""Returns the current extra metadata that will be submitted to Databricks on every request."""
return copy.deepcopy(_extra)


def _reset_extra(extra: List[Tuple[str, str]]):
"""[INTERNAL API] Reset the extra metadata to a new list.
Prefer using with_user_agent_extra instead of this method to avoid overwriting other information included in the
user agent."""
global _extra
_extra = extra


def with_partner(partner: str):
"""Adds the given partner to the metadata submitted to Databricks on every request."""
with_extra("partner", partner)


def _get_upstream_user_agent_info() -> List[Tuple[str, str]]:
"""[INTERNAL API] Return the upstream product and version if specified in the system environment."""
product = os.getenv("DATABRICKS_SDK_UPSTREAM")
version = os.getenv("DATABRICKS_SDK_UPSTREAM_VERSION")
if not product or not version:
return []
return [("upstream", product), ("upstream-version", version)]


def _get_runtime_info() -> List[Tuple[str, str]]:
"""[INTERNAL API] Return the runtime version if running on Databricks."""
if 'DATABRICKS_RUNTIME_VERSION' in os.environ:
runtime_version = os.environ['DATABRICKS_RUNTIME_VERSION']
if runtime_version != '':
runtime_version = _sanitize_header_value(runtime_version)
return [('runtime', runtime_version)]
return []


def _sanitize_header_value(value: str) -> str:
value = value.replace(' ', '-')
value = value.replace('/', '-')
return value


def to_string(alternate_product_info: Optional[Tuple[str, str]] = None,
other_info: Optional[List[Tuple[str, str]]] = None) -> str:
"""Compute the full User-Agent header.
The User-Agent header contains the product name, version, and other metadata that is submitted to Databricks on
every request. There are some static components that are included by default in every request, like the SDK version,
OS name, and Python version. Other components can be optionally overridden or augmented in DatabricksConfig, like
the product name, product version, and extra user-defined information."""
base = []
if alternate_product_info:
base.append(alternate_product_info)
else:
base.append((_product_name, _product_version))
base.extend([("databricks-sdk-py", __version__), ("python", platform.python_version()),
("os", platform.uname().system.lower()), ])
if other_info:
base.extend(other_info)
base.extend(_extra)
base.extend(_get_upstream_user_agent_info())
base.extend(_get_runtime_info())
return " ".join(f"{k}/{v}" for k, v in base)
15 changes: 5 additions & 10 deletions tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,17 @@

import pytest

from databricks.sdk import useragent
from databricks.sdk.config import Config, with_product, with_user_agent_extra
from databricks.sdk.version import __version__

from .conftest import noop_credentials


def test_config_copy_preserves_product_and_product_version():
c = Config(credentials_strategy=noop_credentials, product='foo', product_version='1.2.3')
c2 = c.copy()
assert c2._product == 'foo'
assert c2._product_version == '1.2.3'


def test_config_supports_legacy_credentials_provider():
c = Config(credentials_provider=noop_credentials, product='foo', product_version='1.2.3')
c2 = c.copy()
assert c2._product == 'foo'
assert c2._product_version == '1.2.3'
assert c2._product_info == ('foo', '1.2.3')


@pytest.mark.parametrize('host,expected', [("https://abc.def.ghi", "https://abc.def.ghi"),
Expand Down Expand Up @@ -54,7 +47,7 @@ def system(self):

assert config.user_agent == (
f"test/0.0.0 databricks-sdk-py/{__version__} python/3.0.0 os/testos auth/basic"
f" test-extra-1/1 test-extra-2/2 upstream/upstream-product upstream-version/0.0.1"
" test-extra-1/1 test-extra-2/2 upstream/upstream-product upstream-version/0.0.1"
" runtime/13.1-anything-else")

with_product('some-product', '0.32.1')
Expand All @@ -76,6 +69,8 @@ def test_config_copy_deep_copies_user_agent_other_info(config):
assert "test/test2" in config_copy.user_agent
assert "test/test2" not in config.user_agent

original_extra = useragent.extra()
with_user_agent_extra("blueprint", "0.4.6")
assert "blueprint/0.4.6" in config.user_agent
assert "blueprint/0.4.6" in config_copy.user_agent
useragent._reset_extra(original_extra)
Loading

0 comments on commit 7a49922

Please sign in to comment.