-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support Azure cloud storage service principal authentication - expedite #5926
base: develop
Are you sure you want to change the base?
Conversation
👷 Deploy request for label-studio-docs-new-theme pending review.Visit the deploys page to approve it
|
👷 Deploy request for heartex-docs pending review.Visit the deploys page to approve it
|
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: label_studio/io_storages/base_models.py
Did you find this useful? React with a 👍 or 👎 |
Is it a copy of #5765 ? |
I resolved this issue, all tests are passing now 🚀 🚀 🚀 This storage authentication supports all types of Label studio inputs: presigned urls, blobs, and jsons. Any chance someone can take a look at this @makseq ? 🗡️ 💯 😀 |
ready to merge 😃 - go to CI build |
J’ai chiffré le mot passe @FrsECM |
I added the encryption to avoid storing passwords and have even more security. :) |
Great to see it ! Just for my understanding. I saw that you replace blob urls by something like : ´´´python It is the way you handle it as a standard in LStudio ? |
this is the expected behavior @FrsECM , i learned by setting s3 sync. Azure, GCP, and Redis also work this way with support for uris in json like :///blob. label-studio supports different sources: a) json or b) blob. When it is a) json, it uses this convention of a json like {‘text’: ‘some_text’, ‘image’: uri}, when the uri is something like when it is b) blob , it will read all objects in the container as blobs using https:.microsoft.dfs.com/path/to/blob/<sas_token’ this is explained in more detail here https://labelstud.io/guide/storage |
In our case, we use azure and blob, it’s not dfs.
Then the url is something like that :
* https://<account name>.blob.core.windows.net/<container_name>/<path><sas_token>
How do you handle that case with azure_spi ?
Maybe it's worth adding an extra property on the storagemixin to define the type of storage/urls ?
Because with your url system you lose this information.
…________________________________
De : Manrique Vargas ***@***.***>
Envoyé : Wednesday, July 10, 2024 7:43:22 AM
À : HumanSignal/label-studio ***@***.***>
Cc : François Ponchon ***@***.***>; Mention ***@***.***>
Objet : Re: [HumanSignal/label-studio] feat: Feature/azure service principal - expedite (PR #5926)
Yes, labelstudio supports different sources: a) json or b) blob. When it is a) json, it uses this convention of a json like {‘text’: ‘some_text’, ‘image’: uri}, when the uri is something like s3://<bucket>/<blob> , azure-blob://<container>/<blob> or in our case azure-spi://<container>/<blob> then the backend resolves it to the actual blob url https:<account>.microsoft.dfs.com/path/to/blob/<sas_token’
when it is b) blob , it will read all objects in the container as blobs using https:.microsoft.dfs.com/path/to/blob/<sas_token’
this is explained in more detail here https://labelstud.io/guide/storage
—
Reply to this email directly, view it on GitHub<#5926 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGG5F7FQNOKTOOUPDRUT4VLZLTCXVAVCNFSM6AAAAABIMEXAJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJZGYYDQNRUGA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
There is no dfs involved. Azure spi will create that https:// path using this template
|
Ok good !
I’ll try to test it because in my original implementation i didn’t do the same way.
Thanks !
|
I ended up removing the encryption settings from the PR, @FrsECM cause the UI is still having some trouble with it, this was my last attempt to add the encryption in this separate branch https://github.com/machov/label-studio/tree/dev/man/add_sp |
This PR is stale because it has been open 45 days with no activity. Remove |
PR fulfills these requirements
[fix|feat|ci|chore|doc]: TICKET-ID: Short description of change made
ex.fix: DEV-XXXX: Removed inconsistent code usage causing intermittent errors
Change has impacts in these area(s)
Describe the reason for change
For the moment, there is only an authentication through account key to Azure. The problem is that account key give a lot of rights on the storage that may not be necessary in order to perform readonly operations.
What does this fix?
Now it is possible to have an azure integration that is based on a service principal.
What is the new behavior?
We have to create an app registration in Azure :
We give this registration specific rights depending on what we want to do in labelstudio :
In the UI, we create a new storage integration :
What is the current behavior?
There is no impact on the previous behavior, it's just a new one.
What libraries were added/updated?
Azure Identity
Does this change affect performance?
Get a delegation key is quite slow, that's why there is :
There is a second minor change :
(Today there is a buggy behavior).
Does this change affect security?
It should improve the security for people who just need specific rights on the container. For the moment, the "write" behavior have not been tested / developped on the container.
What alternative approaches were there?
We can use an enterprise application and SSO in order to be linked to a user right and act on behalf of him.
What feature flags were used to cover this change?
None
Does this PR introduce a breaking change?
(check only one)
What level of testing was included in the change?
Which logical domain(s) does this change affect?