Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow renaming datasets & dataset with duplicate names #8075

Open
wants to merge 71 commits into
base: master
Choose a base branch
from

Conversation

MichaelBuessemeyer
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer commented Sep 12, 2024

Further Notes:

  • Quite some of the line changes are the result of moving the ObjectId class to the utils package so that all wk backend servers have access to this class.

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • Give two datasets the same name and check whether annotations and so on works
  • Test whether the task system still works with duplicate dataset names
  • check dataset upload
    • dataset upload
    • add remote
    • compose
  • ...

TODOs:

  • Add evolution and reversion
    • testing needed
  • Test uploading:
    • Report upload fails
  • Adjust worker to newest job arguments as the dataset name can no longer be used to uniquely identify a dataset
  • rename organization_name in worker to organization_id. see Rename organization_name to organization_id in worker args #8038
  • Dataset Name settings field has an unwanted spinner (see upload view)
  • Check the job list
  • Properly implement legacy searching for datasets when old URI param is used
  • Adjust legacy API routes to return dataset in old format
    • It is just an additional field. Thus, I would say it should be fine.
  • datasets appear to be duplicated in the db
    • Maybe these are created by jobs with an output dataset
  • Fix dataset insert
  • Skeleton & VolumeTracings address a dataset via its name
    • Not really used only during task / annotation creation
    • Use heuristic upon upload and temporary patch the Tracing case classes to carry the datasetId during the creation process once the dataset is identified once.
    • Task creation works
    • Needs testing
      • fix annotation upload
    • needs to support old nmls
  • Put datasetId into newly created nmls
  • In the backend LinkedLayerIdentifier still uses the datasetName as an identifier
    • used in wklibs, maybe just interpret the name as a path and work with this. in case it cannot be found the user needs to update wklibs. Add comment for this!
  • [ ] the dataset C555_tps_demo has quite some bucket loading errors. Unsure why some buckets do not work The dataset seems to be broken. Could reproduce this on other branches
  • Notion-style URLs are missing (i.e. -, but only the id part is actually used)
  • Maybe remove DatasetURIParser

Issues:


(Please delete unneeded items, merge only when none are left open)

@MichaelBuessemeyer
Copy link
Contributor Author

@fm3 The backend should be ready for a first glance 👀

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started writing review comments, but that was now pre-empted by our in-person meeting :) still sending these two comments now.

Michael Büßemeyer added 2 commits October 23, 2024 17:20
…thDatasetId and make nml parser return its own result case class
@MichaelBuessemeyer
Copy link
Contributor Author

Note: We agree to renaming dataset.path to dataset.directory_name as this is more clear. See here for more details: https://scm.slack.com/archives/C5AKLAV0B/p1729689141538769

@hotzenklotz
Copy link
Member

@coderabbitai full review

Copy link

coderabbitai bot commented Oct 24, 2024

✅ Actions performed

Full review triggered.

Copy link

coderabbitai bot commented Oct 24, 2024

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@MichaelBuessemeyer
Copy link
Contributor Author

@coderabbitai pause

as pr it tool large anyway

Copy link

coderabbitai bot commented Oct 29, 2024

@MichaelBuessemeyer: Understood, I will pause the automatic reviews for this PR.

✅ Actions performed

Reviews paused.

@MichaelBuessemeyer MichaelBuessemeyer marked this pull request as ready for review October 29, 2024 15:13
@MichaelBuessemeyer
Copy link
Contributor Author

I think this should be ready to go now. In case you notice some errors please tell me 🙏

@MichaelBuessemeyer MichaelBuessemeyer changed the title WIP: Allow renaming datasets & dataset with duplicate names Allow renaming datasets & dataset with duplicate names Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to rename datasets
3 participants