Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve symbolic links to mags and store them in db #8350

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

frcroth
Copy link
Member

@frcroth frcroth commented Jan 22, 2025

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • Create ds with symlinks in mag, ds with symlinks in layer, remote datasets, ...
  • Check DB or GET /datasets/:datasetId/layers/:layer/linkedMags

Overview

In the datastore we now scan for mags and symlinks. Symlinks are resolved for layer directories and mag directories. The result is send to WK via a new route (subject to discussion).
In WK the database stores per mag:

  • path: The path (stored as URI) that wk uses to read the data
  • realPath: The path (stored as URI) where the data actually resides
  • hasLocalData: Is the data in the realPath local to the dataset.

So that

  • For normal datasets, realPath = path and hasLocalData = true
  • For remote datasets, realPath = path and hasLocalData = false with the URIs also indicating the remote scheme
  • For symlinked mags and layers, realPath != path and hasLocalData = false
  • For locally explored datasets realPath = path (as there is no layer directory and WK directly uses the path property in the mag) and hasLocalData = false

Using hasLocalData and realPath we will be able to handle deletions.

I added a route to get a list of mags that use a mags path, this is for testing purposes and can be deleted before merge.

TODOs:

  • Test symlinked mags
  • Test mags with path as file://
  • Store in DB / Find a way to differentiate a ds with a mag that has a file path and the ds which has the data
  • Use toRealPath?
  • When removing directory from local directory whitelist an error occurs and no more paths are added to the db

Issues:


(Please delete unneeded items, merge only when none are left open)

@frcroth frcroth self-assigned this Jan 22, 2025
Copy link
Contributor

coderabbitai bot commented Jan 22, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • test/db/dataSet_mags.csv is excluded by !**/*.csv

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive changes to dataset path management and metadata handling in the WEBKNOSSOS application. The modifications span multiple components, including controllers, services, and database schemas. Key additions include new methods for retrieving and updating dataset paths, introducing case classes to represent dataset magnification information, and extending the database schema to support real paths and local data tracking. The changes enhance the system's ability to manage and track dataset metadata more precisely.

Changes

File Change Summary
app/controllers/DatasetController.scala Added MagLinkInfo case class, new getLinkedMags method for retrieving linked dataset magnifications
app/controllers/WKRemoteDataStoreController.scala Added updatePaths method to update data source paths
app/models/dataset/Dataset.scala Introduced DatasetMagInfo case class, added methods in DatasetMagsDAO for managing dataset magnitude paths
app/models/dataset/DatasetService.scala Added methods updateRealPath, updateRealPaths, and getPathsForDatalayer
conf/evolutions/126-mag-real-paths.sql Added columns realPath, path, and hasLocalData to dataset_mags table
conf/webknossos.latest.routes Added new routes for linked magnifications and path updates
webknossos-datastore/* Added support for data source path handling across multiple services

Assessment against linked issues

Objective Addressed Explanation
Find datasets sharing a layer (6726) New methods like findPathsForDatasetAndDatalayer and getPathsForDatalayer provide infrastructure for tracking shared layers

Possibly related PRs

Suggested labels

backend, enhancement, new feature

Poem

🐰 Paths of data, now so clear,
Magnifications dancing near,
Real paths mapped with gentle care,
Dataset secrets we now share,
A rabbit's code, precise and bright! 🔍


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@frcroth frcroth changed the title wip resolve symbolic links to mags and store them in db Resolve symbolic links to mags and store them in db Jan 29, 2025
@frcroth frcroth marked this pull request as ready for review January 29, 2025 09:17
@frcroth frcroth requested a review from fm3 January 29, 2025 09:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)

Line range hint 49-64: Consider using canonical paths and typed exceptions for security and clarity.

While this logic properly checks remote vs. local schemes, relying on startsWith for whitelisting can be susceptible to path traversal (e.g., embedded "../"). Additionally, throwing raw Exception can obscure the root cause. Use a canonical path check (e.g., toRealPath) and a specialized exception type to enhance security and clarity.

🧹 Nitpick comments (13)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (3)

91-105: Enhance error logging for better debugging.

While the error handling is good, the error messages could be more descriptive to aid in debugging.

Consider adding more context to error messages:

-          logger.error(s"Failed to determine real paths for mags of $ds: $msg")
+          logger.error(s"Failed to determine real paths for mags of dataset ${ds.id}: $msg. This may affect path resolution.")
-          logger.error(s"Failed to determine real paths for mags of $ds")
+          logger.error(s"Failed to determine real paths for mags of dataset ${ds.id}. This may affect path resolution.")

153-158: Add input validation and documentation.

The method would benefit from input validation and clear documentation of its purpose and behavior.

Consider these improvements:

+  /**
+   * Resolves a potentially relative path against a base path.
+   * If the relative path is already absolute, returns it unchanged.
+   * Otherwise, resolves it against the base path and normalizes the result.
+   *
+   * @param basePath The base path to resolve against
+   * @param relativePath The path to resolve
+   * @return The resolved absolute path
+   * @throws NullPointerException if either path is null
+   */
   private def resolveRelativePath(basePath: Path, relativePath: Path): Path = {
+    require(basePath != null, "basePath cannot be null")
+    require(relativePath != null, "relativePath cannot be null")
     if (relativePath.isAbsolute) {
       relativePath
     } else {
       basePath.resolve(relativePath).normalize().toAbsolutePath
     }
   }

91-158: Consider implementing a dedicated PathResolver service.

The new path resolution functionality is well-implemented but could benefit from being extracted into a dedicated service. This would:

  1. Improve separation of concerns
  2. Make the path resolution logic more testable
  3. Reduce the complexity of the DataSourceService class
  4. Make it easier to modify path resolution strategies in the future

Consider creating a new PathResolverService class to handle all path resolution logic, including symlink resolution and URI generation. This would make the codebase more maintainable and modular.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (2)

77-83: Check for invalid local directories and handle them explicitly.

If neither localDirWithScalarMag nor localDirWithVec3Mag exists, the code silently defaults to localDirWithVec3Mag. Updating the logic or adding a warning/error for missing directories can avoid unintentional fallbacks and help debug user misconfiguration.


86-98: Consolidate path resolution logic in a single place.

uriForMagLocator calls resolveMagPath and then applies an additional toAbsolutePath.toUri if not remote. Consider directly returning an absolute path from resolveMagPath for consistency, or unifying their logic to avoid duplication and reduce potential path resolution discrepancies.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala (1)

49-53: Add error handling and logging around real path reporting.

This method simply calls reportRealPaths without any additional handling for potential failures or partial updates. Consider adding logging or structured error handling to ensure visibility into path reporting issues.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)

40-50: Document or validate new path-related fields.

The new case classes DataSourcePathInfo and MagPathInfo introduce important fields (e.g., layerName, mag, path, realPath, hasLocalData). Brief documentation or validation logic would help ensure correct usage across the codebase, especially for mag or path fields.

app/controllers/WKRemoteDataStoreController.scala (1)

224-238: Consider adding request logging and error details.

The method implementation is correct but could benefit from additional logging to aid in debugging:

  1. Log the number of path infos being processed
  2. Include more details in the warning message for invalid JSON

Apply this diff to enhance logging:

 def updatePaths(name: String, key: String): Action[JsValue] = Action.async(parse.json) { implicit request =>
   dataStoreService.validateAccess(name, key) { _ =>
     request.body.validate[List[DataSourcePathInfo]] match {
       case JsSuccess(infos, _) =>
+        logger.info(s"Updating paths for ${infos.length} data sources from data store '$name'")
         for {
           _ <- datasetService.updateRealPaths(infos)(GlobalAccessContext)
         } yield {
           JsonOk
         }
       case e: JsError =>
-        logger.warn("Data store reported invalid json for data source paths.")
+        logger.warn(s"Data store '$name' reported invalid json for data source paths: ${JsError.toJson(e)}")
         Fox.successful(JsonBadRequest(JsError.toJson(e)))
     }
   }
 }
app/models/dataset/DatasetService.scala (1)

343-356: Consider adding error logging for dataset not found case.

The error handling is correct, but adding logging would help track issues in production.

Apply this diff to enhance error logging:

 private def updateRealPath(pathInfo: DataSourcePathInfo)(implicit ctx: DBAccessContext): Fox[Unit] =
   if (pathInfo.magPathInfos.isEmpty) {
     Fox.successful(())
   } else {
     val dataset = datasetDAO.findOneByDataSourceId(pathInfo.dataSourceId).futureBox
     dataset.flatMap {
       case Full(dataset) => datasetMagsDAO.updateMagPathsForDataset(dataset._id, pathInfo.magPathInfos)
       case Empty => // Dataset reported but ignored (non-existing/forbidden org)
+        logger.debug(s"Ignoring path update for non-existing dataset: ${pathInfo.dataSourceId}")
         Fox.successful(())
       case e: EmptyBox =>
+        logger.warn(s"Failed to find dataset: ${pathInfo.dataSourceId}", e)
         Fox.failure("dataset.notFound", e)
     }
   }
app/controllers/DatasetController.scala (1)

137-149: Consider adding pagination for large datasets.

The method correctly implements the required functionality but might need pagination for datasets with many mags.

Apply this diff to add pagination:

-def getLinkedMags(datasetId: String, dataLayerName: String): Action[AnyContent] =
+def getLinkedMags(datasetId: String, dataLayerName: String, limit: Option[Int], offset: Option[Int]): Action[AnyContent] =
   sil.SecuredAction.async { implicit request =>
     for {
       datasetIdValidated <- ObjectId.fromString(datasetId)
       _ <- datasetDAO.findOne(datasetIdValidated) ?~> notFoundMessage(datasetId) ~> NOT_FOUND
       _ <- Fox.bool2Fox(request.identity.isAdmin) ?~> "notAllowed" ~> FORBIDDEN
       magsAndLinkedMags <- datasetService.getPathsForDatalayer(datasetIdValidated, dataLayerName)
-      returnValues = magsAndLinkedMags.map {
+      returnValues = magsAndLinkedMags.drop(offset.getOrElse(0)).take(limit.getOrElse(Int.MaxValue)).map {
         case (mag, linkedMags) => MagLinkInfo(mag, linkedMags)
       }
     } yield Ok(Json.toJson(returnValues))
   }
conf/webknossos.latest.routes (1)

94-94: Update route documentation.

The routes are correctly defined but should be documented in the API documentation.

Add comments above each route to describe their purpose:

+# Get linked magnifications for a dataset layer
 GET           /datasets/:datasetId/layers/:layer/linkedMags                                     controllers.DatasetController.getLinkedMags(datasetId: String, layer: String)

+# Update data source paths in the datastore
 PUT           /datastores/:name/datasources/paths                                               controllers.WKRemoteDataStoreController.updatePaths(name: String, key: String)

Also applies to: 112-112

app/models/dataset/Dataset.scala (1)

767-784: Consider improving the handling of uninitialized paths.

Instead of using a magic string "uninitialized" for null paths, consider:

  1. Using an Option type to represent potentially uninitialized paths
  2. Handling the null case at the application level where the context is clearer
-                         row.path.getOrElse("uninitialized"),
-                         row.realpath.getOrElse("uninitialized"),
+                         row.path.getOrElse(""),
+                         row.realpath.getOrElse(""),
conf/evolutions/126-mag-real-paths.sql (1)

1-12: Well-structured SQL evolution with proper safeguards!

The SQL evolution:

  • Uses transaction for atomicity
  • Includes schema version check
  • Adds columns with appropriate types
  • Sets sensible default for hasLocalData

However, consider adding NOT NULL constraints to path and realPath columns since they are required fields in the DatasetMagInfo case class.

-ALTER TABLE webknossos.dataset_mags ADD COLUMN realPath TEXT;
-ALTER TABLE webknossos.dataset_mags ADD COLUMN path TEXT;
+ALTER TABLE webknossos.dataset_mags ADD COLUMN realPath TEXT NOT NULL DEFAULT '';
+ALTER TABLE webknossos.dataset_mags ADD COLUMN path TEXT NOT NULL DEFAULT '';
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd97cd5 and d839176.

📒 Files selected for processing (13)
  • MIGRATIONS.unreleased.md (1 hunks)
  • app/controllers/DatasetController.scala (2 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (3 hunks)
  • conf/evolutions/126-mag-real-paths.sql (1 hunks)
  • conf/evolutions/reversions/126-mag-real-paths.sql (1 hunks)
  • conf/webknossos.latest.routes (2 hunks)
  • tools/postgres/schema.sql (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (16)
conf/evolutions/reversions/126-mag-real-paths.sql (3)

1-3: LGTM! Transaction and schema version check are properly implemented.

The transaction block ensures atomicity, and the schema version assertion provides a critical safety check against incorrect migrations.


9-11: LGTM! Schema version update and transaction commit are correct.

The schema version is properly decremented to 125, and the transaction is committed to ensure all changes are applied atomically.


5-7: Verify the impact of dropping these columns.

The column drops align with the PR objectives for managing symlinked mags. However, ensure that:

  1. Any existing data in these columns has been properly migrated or is safe to discard
  2. All application code references to these columns are removed
  3. The corresponding forward migration (126-mag-real-paths.sql) correctly creates these columns
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (3)

11-11: LGTM: Required imports added for new functionality.

The new imports for MagLocator and DataVaultService align with the PR's objective to handle mag symlinks.

Also applies to: 15-15


81-81: LGTM: Path resolution integrated into inbox scanning.

The addition of sendRealPaths call ensures that real paths are processed and published whenever inbox sources are scanned.


143-151: LGTM: Clean and focused implementation.

The method follows the single responsibility principle and clearly handles mag URI resolution.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)

65-70: Raise an error if neither dataset nor layer path exists.

Currently, if magPathRelativeToDataset does not exist, the code falls back to magPathRelativeToLayer. Consider raising an error or logging a warning if neither path is found, ensuring graceful handling of invalid file structures.

app/controllers/WKRemoteDataStoreController.scala (1)

10-10: LGTM!

The import statement correctly includes the new DataSourcePathInfo type needed for the path update functionality.

app/models/dataset/DatasetService.scala (2)

17-17: LGTM!

The imports are correctly organized and include all necessary types for the new functionality.

Also applies to: 23-23


357-361: LGTM!

The updateRealPaths method correctly serializes updates to maintain consistency.

app/controllers/DatasetController.scala (1)

74-78: LGTM!

The case class and its JSON format are correctly defined.

app/models/dataset/Dataset.scala (3)

24-24: LGTM!

The import for MagPathInfo is correctly added to support the new mag path handling functionality.


712-721: Well-structured data model for mag information!

The DatasetMagInfo case class and its companion object are well-designed:

  • Encapsulates all necessary fields for mag path information
  • Provides JSON serialization support
  • Aligns with the PR objectives for storing symbolic link information

748-765: Robust implementation with proper transaction handling!

The implementation includes:

  • Proper SQL escaping and type handling
  • Transaction with serializable isolation
  • Retry logic for handling serialization failures
  • Batch updates for efficiency
MIGRATIONS.unreleased.md (1)

21-22: LGTM!

The migration guide is correctly updated to include the new SQL evolution.

tools/postgres/schema.sql (1)

23-23: LGTM!

The schema changes:

  • Correctly update the schema version
  • Add the new columns matching the SQL evolution
  • Set appropriate default for hasLocalData

Also applies to: 181-183

Comment on lines +107 to +141
private def determineMagRealPathsForDataSource(dataSource: InboxDataSource) = tryo {
val organizationPath = dataBaseDir.resolve(dataSource.id.organizationId)
val datasetPath = organizationPath.resolve(dataSource.id.directoryName)
dataSource.toUsable match {
case Some(usableDataSource) => {
usableDataSource.dataLayers.flatMap { dataLayer =>
val rawLayerPath = datasetPath.resolve(dataLayer.name)
val absoluteLayerPath = if (Files.isSymbolicLink(rawLayerPath)) {
resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
} else {
rawLayerPath.toAbsolutePath
}
dataLayer.mags.map { mag =>
val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
if (isRemote) {
MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
} else {
val magPath = Paths.get(magURI)
val realPath = magPath.toRealPath()
// Does this dataset have local data, i.e. the data that is referenced by the mag path is within the dataset directory
val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
val unresolvedPath =
rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
MagPathInfo(dataLayer.name,
mag.mag,
unresolvedPath.toUri.toString,
realPath.toUri.toString,
hasLocalData = isLocal)
}
}
}
}
case None => List()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for symlink resolution and improve code structure.

The method handles complex path resolution logic but has some areas for improvement:

  1. Missing error handling for symlink resolution
  2. Complex nested structure could be simplified
  3. Missing documentation for the business logic

Consider these improvements:

   private def determineMagRealPathsForDataSource(dataSource: InboxDataSource) = tryo {
+    // Resolves paths for each mag in the data source, handling both local and remote data
+    // Returns a list of MagPathInfo containing the original and resolved paths
     val organizationPath = dataBaseDir.resolve(dataSource.id.organizationId)
     val datasetPath = organizationPath.resolve(dataSource.id.directoryName)
     dataSource.toUsable match {
       case Some(usableDataSource) => {
         usableDataSource.dataLayers.flatMap { dataLayer =>
           val rawLayerPath = datasetPath.resolve(dataLayer.name)
-          val absoluteLayerPath = if (Files.isSymbolicLink(rawLayerPath)) {
-            resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
-          } else {
-            rawLayerPath.toAbsolutePath
-          }
+          val absoluteLayerPath = try {
+            if (Files.isSymbolicLink(rawLayerPath)) {
+              resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
+            } else {
+              rawLayerPath.toAbsolutePath
+            }
+          } catch {
+            case e: Exception =>
+              logger.error(s"Failed to resolve layer path for ${dataLayer.name}: ${e.getMessage}")
+              rawLayerPath.toAbsolutePath
+          }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private def determineMagRealPathsForDataSource(dataSource: InboxDataSource) = tryo {
val organizationPath = dataBaseDir.resolve(dataSource.id.organizationId)
val datasetPath = organizationPath.resolve(dataSource.id.directoryName)
dataSource.toUsable match {
case Some(usableDataSource) => {
usableDataSource.dataLayers.flatMap { dataLayer =>
val rawLayerPath = datasetPath.resolve(dataLayer.name)
val absoluteLayerPath = if (Files.isSymbolicLink(rawLayerPath)) {
resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
} else {
rawLayerPath.toAbsolutePath
}
dataLayer.mags.map { mag =>
val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
if (isRemote) {
MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
} else {
val magPath = Paths.get(magURI)
val realPath = magPath.toRealPath()
// Does this dataset have local data, i.e. the data that is referenced by the mag path is within the dataset directory
val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
val unresolvedPath =
rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
MagPathInfo(dataLayer.name,
mag.mag,
unresolvedPath.toUri.toString,
realPath.toUri.toString,
hasLocalData = isLocal)
}
}
}
}
case None => List()
}
}
private def determineMagRealPathsForDataSource(dataSource: InboxDataSource) = tryo {
// Resolves paths for each mag in the data source, handling both local and remote data
// Returns a list of MagPathInfo containing the original and resolved paths
val organizationPath = dataBaseDir.resolve(dataSource.id.organizationId)
val datasetPath = organizationPath.resolve(dataSource.id.directoryName)
dataSource.toUsable match {
case Some(usableDataSource) => {
usableDataSource.dataLayers.flatMap { dataLayer =>
val rawLayerPath = datasetPath.resolve(dataLayer.name)
val absoluteLayerPath = try {
if (Files.isSymbolicLink(rawLayerPath)) {
resolveRelativePath(datasetPath, Files.readSymbolicLink(rawLayerPath))
} else {
rawLayerPath.toAbsolutePath
}
} catch {
case e: Exception =>
logger.error(s"Failed to resolve layer path for ${dataLayer.name}: ${e.getMessage}")
rawLayerPath.toAbsolutePath
}
dataLayer.mags.map { mag =>
val (magURI, isRemote) = getMagURI(datasetPath, absoluteLayerPath, mag)
if (isRemote) {
MagPathInfo(dataLayer.name, mag.mag, magURI.toString, magURI.toString, hasLocalData = false)
} else {
val magPath = Paths.get(magURI)
val realPath = magPath.toRealPath()
// Does this dataset have local data, i.e. the data that is referenced by the mag path is within the dataset directory
val isLocal = realPath.startsWith(datasetPath.toAbsolutePath)
val unresolvedPath =
rawLayerPath.toAbsolutePath.resolve(absoluteLayerPath.relativize(magPath)).normalize()
MagPathInfo(dataLayer.name,
mag.mag,
unresolvedPath.toUri.toString,
realPath.toUri.toString,
hasLocalData = isLocal)
}
}
}
}
case None => List()
}
}

Comment on lines +117 to +122
def reportRealPaths(dataSourcePaths: List[DataSourcePathInfo]): Fox[_] =
rpc(s"$webknossosUri/api/datastores/$dataStoreName/datasources/paths")
.addQueryString("key" -> dataStoreKey)
.silent
.putJson(dataSourcePaths)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider explicit result type and user feedback.

reportRealPaths returns Fox[_] and uses .silent. In case of partial or complete failure, there is no user-facing feedback or logs. Specifying a clearer return type (e.g., Fox[ReportPathsResult]) and/or logging errors can facilitate troubleshooting.

Comment on lines +362 to +371
def getPathsForDatalayer(datasetId: ObjectId, layerName: String): Fox[List[(DatasetMagInfo, List[DatasetMagInfo])]] =
for {
magInfos <- datasetMagsDAO.findPathsForDatasetAndDatalayer(datasetId, layerName)
magInfosAndLinkedMags <- Fox.serialCombined(magInfos)(magInfo =>
for {
pathInfos <- datasetMagsDAO.findAllByRealPath(magInfo.realPath)
} yield (magInfo, pathInfos.filter(!_.equals(magInfo))))

} yield magInfosAndLinkedMags

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding input validation for layer name.

The method should validate the layer name before querying the database.

Apply this diff to add validation:

 def getPathsForDatalayer(datasetId: ObjectId, layerName: String): Fox[List[(DatasetMagInfo, List[DatasetMagInfo])]] =
   for {
+    _ <- assertValidLayerNameLax(layerName)
     magInfos <- datasetMagsDAO.findPathsForDatasetAndDatalayer(datasetId, layerName)
     magInfosAndLinkedMags <- Fox.serialCombined(magInfos)(magInfo =>
       for {
         pathInfos <- datasetMagsDAO.findAllByRealPath(magInfo.realPath)
       } yield (magInfo, pathInfos.filter(!_.equals(magInfo))))
   } yield magInfosAndLinkedMags

Committable suggestion skipped: line range outside the PR's diff.

app/models/dataset/Dataset.scala Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)

765-782: Remove unnecessary empty Fox.successful.

The empty Fox.successful(()) on line 767 is unnecessary and can be removed.

  def updateMagPathsForDataset(datasetId: ObjectId, magPaths: List[MagPathInfo]): Fox[Unit] =
    for {
-     _ <- Fox.successful(())
      updateQueries = magPaths.map(pathInfo => {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d839176 and 3553612.

📒 Files selected for processing (1)
  • app/models/dataset/Dataset.scala (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (4)
app/models/dataset/Dataset.scala (4)

712-721: LGTM! Well-structured case class implementation.

The DatasetMagInfo case class and its companion object are well-designed, providing a clean representation of mag path metadata with proper JSON serialization support. The implementation aligns perfectly with the PR objectives for storing mag paths.


736-738: Replace "uninitialized" with empty string as default value.

Using "uninitialized" as a default value for paths is not ideal. Consider using an empty string instead, as suggested in previous reviews.

Apply this diff:

-                  row.path.getOrElse("uninitialized"),
-                  row.realpath.getOrElse("uninitialized"),
+                  row.path.getOrElse(""),
+                  row.realpath.getOrElse(""),

784-794: LGTM! Well-implemented query and mapping logic.

The method correctly retrieves and maps mag paths for a dataset and layer, with proper error handling using Fox combinators.


795-810: Replace "uninitialized" with empty string and consider adding an index.

  1. Replace "uninitialized" with empty string as default value.
  2. Consider adding an index on the realPath column since we're querying by it.

Apply this diff for the default values:

-                         row.path.getOrElse("uninitialized"),
-                         row.realpath.getOrElse("uninitialized"),
+                         row.path.getOrElse(""),
+                         row.realpath.getOrElse(""),

Consider adding the following index to improve query performance:

CREATE INDEX IF NOT EXISTS dataset_mags_realpath_idx ON webknossos.dataset_mags(realPath);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Find all datasets that share a layer
1 participant