-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBUtils implementation for Volumes #623
Conversation
This PR breaks backwards compatibility for databrickslabs/ucx downstream. See build logs for more details. Running from downstreams #65 |
databricks/sdk/mixins/files.py
Outdated
return | ||
queue = [self] | ||
while queue: | ||
next_path, queue = queue[0], queue[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slicing the list will create a copy so this will be O(N), we can improve this to O(1) using deque
and use .popleft()
to get the first element.
databricks/sdk/mixins/files.py
Outdated
return | ||
queue = [self] | ||
while queue: | ||
next_path, queue = queue[0], queue[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
return _DbfsPath(self, src) | ||
if src.startswith('dbfs:'): | ||
src = src[len('dbfs:'):] | ||
if str(src).startswith('/Volumes'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider typos here: https://docs.databricks.com/en/connect/unity-catalog/volumes.html
Paths are also reserved for potential typos for these paths from Apache Spark APIs and dbutils, including /volumes, /Volume, /volume, whether or not they are preceded by dbfs:/. The path /dbfs/Volumes is also reserved, but cannot be used to access volumes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those paths are reserved but may not be usable themselves for looking up volumes (hopefully). Given that we use the REST API, we may need to be more narrow in what we accept than in DBR, since these paths correspond directly to REST API parameters.
…#631) ## Changes #623 introduced DBUtils support for volumes but also caused a small regression in listing behavior: `dbutils.fs.ls()` should not include the `dbfs:` scheme. This PR makes that fix. Additionally, it fixes a small bug in volumes recursive listing, only including the file paths as is the behavior with DBFS. ## Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [ ] `make test` run locally - [ ] `make fmt` applied - [ ] relevant integration tests applied
### New Features * DBUtils implementation for Volumes ([#623](#623), [#634](#634), [#631](#631)). ### Bug Fixes * Fixed codecov for repository ([#636](#636)). API Changes: * Added `ingestion_definition` field for `databricks.sdk.service.pipelines.CreatePipeline`. * Added `ingestion_definition` field for `databricks.sdk.service.pipelines.EditPipeline`. * Added `ingestion_definition` field for `databricks.sdk.service.pipelines.PipelineSpec`. * Added `databricks.sdk.service.pipelines.IngestionConfig` dataclass. * Added `databricks.sdk.service.pipelines.ManagedIngestionPipelineDefinition` dataclass. * Added `databricks.sdk.service.pipelines.SchemaSpec` dataclass. * Added `databricks.sdk.service.pipelines.TableSpec` dataclass. * Changed `create()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service . New request type is `databricks.sdk.service.serving.CreateAppRequest` dataclass. * Changed `create()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service to return `databricks.sdk.service.serving.App` dataclass. * Removed `delete_app()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Removed `get_app()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Removed `get_app_deployment_status()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Removed `get_apps()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Removed `get_events()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `create_deployment()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `delete()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `get()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `get_deployment()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `get_environment()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `list()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `list_deployments()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `stop()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `update()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `get_open_api()` method for [w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html) workspace-level service. * Removed `databricks.sdk.service.serving.AppEvents` dataclass. * Removed `databricks.sdk.service.serving.AppManifest` dataclass. * Removed `databricks.sdk.service.serving.AppServiceStatus` dataclass. * Removed `databricks.sdk.service.serving.DeleteAppResponse` dataclass. * Removed `databricks.sdk.service.serving.DeployAppRequest` dataclass. * Removed `databricks.sdk.service.serving.DeploymentStatus` dataclass. * Removed `databricks.sdk.service.serving.DeploymentStatusState` dataclass. * Removed `databricks.sdk.service.serving.GetAppDeploymentStatusRequest` dataclass. * Removed `databricks.sdk.service.serving.GetAppResponse` dataclass. * Removed `databricks.sdk.service.serving.GetEventsRequest` dataclass. * Removed `databricks.sdk.service.serving.ListAppEventsResponse` dataclass. * Changed `apps` field for `databricks.sdk.service.serving.ListAppsResponse` to `databricks.sdk.service.serving.AppList` dataclass. * Added `databricks.sdk.service.serving.App` dataclass. * Added `databricks.sdk.service.serving.AppDeployment` dataclass. * Added `databricks.sdk.service.serving.AppDeploymentState` dataclass. * Added `databricks.sdk.service.serving.AppDeploymentStatus` dataclass. * Added `databricks.sdk.service.serving.AppEnvironment` dataclass. * Added `databricks.sdk.service.serving.AppState` dataclass. * Added `databricks.sdk.service.serving.AppStatus` dataclass. * Added `databricks.sdk.service.serving.CreateAppDeploymentRequest` dataclass. * Added `databricks.sdk.service.serving.CreateAppRequest` dataclass. * Added `databricks.sdk.service.serving.EnvVariable` dataclass. * Added `databricks.sdk.service.serving.GetAppDeploymentRequest` dataclass. * Added `databricks.sdk.service.serving.GetAppEnvironmentRequest` dataclass. * Added `databricks.sdk.service.serving.GetOpenApiRequest` dataclass. * Added `any` dataclass. * Added `databricks.sdk.service.serving.ListAppDeploymentsRequest` dataclass. * Added `databricks.sdk.service.serving.ListAppDeploymentsResponse` dataclass. * Added `databricks.sdk.service.serving.ListAppsRequest` dataclass. * Added `databricks.sdk.service.serving.StopAppRequest` dataclass. * Added `any` dataclass. * Added `databricks.sdk.service.serving.UpdateAppRequest` dataclass. * Removed [w.csp_enablement](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/csp_enablement.html) workspace-level service. * Removed [w.esm_enablement](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/esm_enablement.html) workspace-level service. * Added [w.compliance_security_profile](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/compliance_security_profile.html) workspace-level service. * Added [w.enhanced_security_monitoring](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/enhanced_security_monitoring.html) workspace-level service. * Removed `databricks.sdk.service.settings.CspEnablement` dataclass. * Removed `databricks.sdk.service.settings.CspEnablementSetting` dataclass. * Removed `databricks.sdk.service.settings.EsmEnablement` dataclass. * Removed `databricks.sdk.service.settings.EsmEnablementSetting` dataclass. * Removed `databricks.sdk.service.settings.GetCspEnablementSettingRequest` dataclass. * Removed `databricks.sdk.service.settings.GetEsmEnablementSettingRequest` dataclass. * Removed `databricks.sdk.service.settings.UpdateCspEnablementSettingRequest` dataclass. * Removed `databricks.sdk.service.settings.UpdateEsmEnablementSettingRequest` dataclass. * Added `databricks.sdk.service.settings.ComplianceSecurityProfile` dataclass. * Added `databricks.sdk.service.settings.ComplianceSecurityProfileSetting` dataclass. * Added `databricks.sdk.service.settings.EnhancedSecurityMonitoring` dataclass. * Added `databricks.sdk.service.settings.EnhancedSecurityMonitoringSetting` dataclass. * Added `databricks.sdk.service.settings.GetComplianceSecurityProfileSettingRequest` dataclass. * Added `databricks.sdk.service.settings.GetEnhancedSecurityMonitoringSettingRequest` dataclass. * Added `databricks.sdk.service.settings.UpdateComplianceSecurityProfileSettingRequest` dataclass. * Added `databricks.sdk.service.settings.UpdateEnhancedSecurityMonitoringSettingRequest` dataclass. * Added `tags` field for `databricks.sdk.service.sql.DashboardEditContent`. * Added `tags` field for `databricks.sdk.service.sql.QueryEditContent`. * Added `catalog` field for `databricks.sdk.service.sql.QueryOptions`. * Added `schema` field for `databricks.sdk.service.sql.QueryOptions`. * Added `tags` field for `databricks.sdk.service.sql.QueryPostContent`. * Added `query` field for `databricks.sdk.service.sql.Visualization`. OpenAPI SHA: 9bb7950fa3390afb97abaa552934bc0a2e069de5, Date: 2024-05-02
### New Features * DBUtils implementation for Volumes ([#623](#623), [#634](#634), [#631](#631)). You can now use `w.dbutils.fs` with UC volumes paths. Error handling for non-UC, non-DBFS and non-local paths has also been improved. ### Bug Fixes * Fixed codecov for repository ([#636](#636)). API Changes: * Added `ingestion_definition` field for `databricks.sdk.service.pipelines.CreatePipeline`. * Added `ingestion_definition` field for `databricks.sdk.service.pipelines.EditPipeline`. * Added `ingestion_definition` field for `databricks.sdk.service.pipelines.PipelineSpec`. * Added `databricks.sdk.service.pipelines.IngestionConfig` dataclass. * Added `databricks.sdk.service.pipelines.ManagedIngestionPipelineDefinition` dataclass. * Added `databricks.sdk.service.pipelines.SchemaSpec` dataclass. * Added `databricks.sdk.service.pipelines.TableSpec` dataclass. * Changed `create()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service . New request type is `databricks.sdk.service.serving.CreateAppRequest` dataclass. * Changed `create()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service to return `databricks.sdk.service.serving.App` dataclass. * Removed `delete_app()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Removed `get_app()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Removed `get_app_deployment_status()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Removed `get_apps()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Removed `get_events()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `create_deployment()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `delete()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `get()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `get_deployment()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `get_environment()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `list()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `list_deployments()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `stop()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `update()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service. * Added `get_open_api()` method for [w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html) workspace-level service. * Removed `databricks.sdk.service.serving.AppEvents` dataclass. * Removed `databricks.sdk.service.serving.AppManifest` dataclass. * Removed `databricks.sdk.service.serving.AppServiceStatus` dataclass. * Removed `databricks.sdk.service.serving.DeleteAppResponse` dataclass. * Removed `databricks.sdk.service.serving.DeployAppRequest` dataclass. * Removed `databricks.sdk.service.serving.DeploymentStatus` dataclass. * Removed `databricks.sdk.service.serving.DeploymentStatusState` dataclass. * Removed `databricks.sdk.service.serving.GetAppDeploymentStatusRequest` dataclass. * Removed `databricks.sdk.service.serving.GetAppResponse` dataclass. * Removed `databricks.sdk.service.serving.GetEventsRequest` dataclass. * Removed `databricks.sdk.service.serving.ListAppEventsResponse` dataclass. * Changed `apps` field for `databricks.sdk.service.serving.ListAppsResponse` to `databricks.sdk.service.serving.AppList` dataclass. * Added `databricks.sdk.service.serving.App` dataclass. * Added `databricks.sdk.service.serving.AppDeployment` dataclass. * Added `databricks.sdk.service.serving.AppDeploymentState` dataclass. * Added `databricks.sdk.service.serving.AppDeploymentStatus` dataclass. * Added `databricks.sdk.service.serving.AppEnvironment` dataclass. * Added `databricks.sdk.service.serving.AppState` dataclass. * Added `databricks.sdk.service.serving.AppStatus` dataclass. * Added `databricks.sdk.service.serving.CreateAppDeploymentRequest` dataclass. * Added `databricks.sdk.service.serving.CreateAppRequest` dataclass. * Added `databricks.sdk.service.serving.EnvVariable` dataclass. * Added `databricks.sdk.service.serving.GetAppDeploymentRequest` dataclass. * Added `databricks.sdk.service.serving.GetAppEnvironmentRequest` dataclass. * Added `databricks.sdk.service.serving.GetOpenApiRequest` dataclass. * Added `any` dataclass. * Added `databricks.sdk.service.serving.ListAppDeploymentsRequest` dataclass. * Added `databricks.sdk.service.serving.ListAppDeploymentsResponse` dataclass. * Added `databricks.sdk.service.serving.ListAppsRequest` dataclass. * Added `databricks.sdk.service.serving.StopAppRequest` dataclass. * Added `any` dataclass. * Added `databricks.sdk.service.serving.UpdateAppRequest` dataclass. * Removed [w.csp_enablement](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/csp_enablement.html) workspace-level service. * Removed [w.esm_enablement](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/esm_enablement.html) workspace-level service. * Added [w.compliance_security_profile](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/compliance_security_profile.html) workspace-level service. * Added [w.enhanced_security_monitoring](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/enhanced_security_monitoring.html) workspace-level service. * Removed `databricks.sdk.service.settings.CspEnablement` dataclass. * Removed `databricks.sdk.service.settings.CspEnablementSetting` dataclass. * Removed `databricks.sdk.service.settings.EsmEnablement` dataclass. * Removed `databricks.sdk.service.settings.EsmEnablementSetting` dataclass. * Removed `databricks.sdk.service.settings.GetCspEnablementSettingRequest` dataclass. * Removed `databricks.sdk.service.settings.GetEsmEnablementSettingRequest` dataclass. * Removed `databricks.sdk.service.settings.UpdateCspEnablementSettingRequest` dataclass. * Removed `databricks.sdk.service.settings.UpdateEsmEnablementSettingRequest` dataclass. * Added `databricks.sdk.service.settings.ComplianceSecurityProfile` dataclass. * Added `databricks.sdk.service.settings.ComplianceSecurityProfileSetting` dataclass. * Added `databricks.sdk.service.settings.EnhancedSecurityMonitoring` dataclass. * Added `databricks.sdk.service.settings.EnhancedSecurityMonitoringSetting` dataclass. * Added `databricks.sdk.service.settings.GetComplianceSecurityProfileSettingRequest` dataclass. * Added `databricks.sdk.service.settings.GetEnhancedSecurityMonitoringSettingRequest` dataclass. * Added `databricks.sdk.service.settings.UpdateComplianceSecurityProfileSettingRequest` dataclass. * Added `databricks.sdk.service.settings.UpdateEnhancedSecurityMonitoringSettingRequest` dataclass. * Added `tags` field for `databricks.sdk.service.sql.DashboardEditContent`. * Added `tags` field for `databricks.sdk.service.sql.QueryEditContent`. * Added `catalog` field for `databricks.sdk.service.sql.QueryOptions`. * Added `schema` field for `databricks.sdk.service.sql.QueryOptions`. * Added `tags` field for `databricks.sdk.service.sql.QueryPostContent`. * Added `query` field for `databricks.sdk.service.sql.Visualization`. OpenAPI SHA: 9bb7950fa3390afb97abaa552934bc0a2e069de5, Date: 2024-05-02
Changes
UC Volumes has been released for some time, but users are unable to use UC Volumes with
dbutils.fs
from the SDK. This PR implements support for/Volumes
paths in DBUtils in the SDK.I've done this primarily by extending
DbfsExt
to work with UC Volumes paths. A new class_VolumePath
is supported, implementing the set of base operations defined on the_Path
parent abstract class for volume paths. An accompanying_VolumesIO
is also implemented to provide a consistent interface for reading from and writing to UC Volumes files, especially for writing: thedownload
API already returns aBinaryIO
, but theupload
API accepts aBinaryIO
, so this adapter allows for a user to "open" a Volumes path for writing.In order to properly implement
ls
, I changed the existing implementation oflist
in_Path
subclasses to return a generator ofFileInfo
s. This allows for better reuse of this common functionality betweenls
,cp
andmv
.Open questions:
/Volumes
on disk and UC Volumes? E.g. via a different scheme?Tests
make test
run locallymake fmt
applied