Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(OpenSearch ): Repeated CloudFormation while deleting OpenSearch VectorIndex #667

Open
1 task done
krokoko opened this issue Aug 27, 2024 · 1 comment
Open
1 task done
Assignees
Labels
backlog bug Something isn't working

Comments

@krokoko
Copy link
Collaborator

krokoko commented Aug 27, 2024

Describe the bug

Opening on behalf of an internal user

When CloudFormation attempts to delete a OpenSearch VectorIndex instance, they are encountering repeated CloudFormation errors (DELETE_FAILED) but eventually the CloudFormation deployment succeeds anyways.

The issue adds minutes of time to the CloudFormation stack deployment time because it attempts to repeatedly delete the OpenSearch VectorIndex, fail, then backoff.

Expected Behavior

Gracefully delete the OpenSearch index without any issues.

Current Behavior

Deletion throws errors

Reproduction Steps

None as of now, will need a code snippet to reproduce

Possible Solution

The CR Lambda actually uses this IAM policy: https://github.com/awslabs/generative-ai-cdk-constructs/blob/main/src/cdk-lib/opensearchserverless/vector-collection.ts#L138-L151
The problem with that IAM policy is that it only points to a single instance of an OpenSearch collection, but the Lambda is a static resource for the entire CloudFormation stack. So if you deploy multiple Knowledge Base + OpenSearch collection & indexes in the same CloudFormation stack this is problematic because it won’t have permissions to modify all of them.
They manually modified that IAM policy to allow access to all collections within the account and confirmed that it will gracefully delete the OpenSearch index without any issues.

We may not want to directly make this change in the CDK library because doing so introduces bug in grantDataAccess which grants access more broadly than it should.

Instead we may want to create a custom IAM policy just for the custom resource Lambda that has broad access to all collections within the account, rather than have it point to props.collection.aossPolicy as it does now.

Additional Information/Context

No response

CDK CLI Version

2.154.1

Framework Version

No response

Node.js Version

20

OS

MacOs

Language

Typescript, Python, .NET, Go

Language Version

No response

Region experiencing the issue

any

Code modification

no

Other information

No response

Service quota

  • I have reviewed the service quotas for this construct
@krokoko krokoko added bug Something isn't working needs-triage This issue or PR still needs to be triaged. labels Aug 27, 2024
@krokoko krokoko added backlog and removed needs-triage This issue or PR still needs to be triaged. labels Oct 23, 2024
@scottschreckengaust scottschreckengaust self-assigned this Oct 30, 2024
@scottschreckengaust
Copy link
Collaborator

I was unable to replicate. Is there a particular example to follow up with to fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog bug Something isn't working
Projects
Development

No branches or pull requests

2 participants