Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App upgrade notification shown for incorrect app #9613

Closed
thehejik opened this issue Aug 29, 2023 · 21 comments · Fixed by #9806
Closed

App upgrade notification shown for incorrect app #9613

thehejik opened this issue Aug 29, 2023 · 21 comments · Fixed by #9806
Assignees
Labels
area/charts kind/bug QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this size/2 Size Estimate 2 teams/mapps
Milestone

Comments

@thehejik
Copy link

Setup

Describe the bug
When Epinio Extension is installed it offers a wrong update 102.0.3+up1.8.1 which belongs to a different App - official Epinio Rancher App (not an extension but app).

To Reproduce

  • Install and Enable the Epinio extension (add the extension repo https://epinio.github.io/ui, name it epinio-extension)
  • Go to local cluster -> Apps -> Installed Apps and filter out cattle-ui-plugin-system namespace

Result
Note the update "version tag" which belongs to different app (Epinio App).
image

Expected Result
It should recognize only updates for the extension.

@richard-cox
Copy link
Member

richard-cox commented Aug 29, 2023

I think this is a bug with generally how we connect installed applications and helm repo charts. There's no direct link between these two other than names (different source repo, different namespaces, etc).

So keeping this out of the area/extension label

Edit: Name, also version matches

@richard-cox richard-cox added this to the v2.8.0 milestone Aug 29, 2023
@richard-cox richard-cox changed the title Epinio Extension offers an update for Rancher Epinio App App upgrade notification shown for incorrect app Aug 29, 2023
@gaktive
Copy link
Member

gaktive commented Sep 6, 2023

This is happening with all extensions and charts, not just Epinio.

@gaktive gaktive added size/2 Size Estimate 2 [zube]: Groomed and removed [zube]: Backlog labels Sep 6, 2023
@cnotv cnotv self-assigned this Sep 13, 2023
@zube zube bot removed the [zube]: Groomed label Sep 13, 2023
@cnotv
Copy link
Member

cnotv commented Sep 13, 2023

@gaktive @richard-cox it is not written in this issue how to reproduce it with other charts and I cannot reproduce following the steps. Any hint?
Screenshot 2023-09-13 at 15 20 16

@richard-cox
Copy link
Member

For the original bug

  1. In the local cluster install epinio (it should be available by default in Apps --> Charts). You'll need to install +up1.9.0 (i can't see it in my 2.7.5 system, you may need 2.7-head or 2.8-head)
  2. Install the epinio extension (see https://github.com/epinio/ui/tree/main/docs/developer#production-flow)

If there's issues bringing up that system feel free to reach out to Tomas, he may be able to provide you access to his.

Failing that, it might be tricky. I think you need to get two helm charts that contain the same app version and name

@cnotv
Copy link
Member

cnotv commented Sep 13, 2023

I may rather mock the data where the function is used at this point, since anyway we have to write the test.

@cnotv
Copy link
Member

cnotv commented Sep 22, 2023

@richard-cox I get

Error: execution error at (epinio/templates/validate-cert-manager-crd.yaml:16:7): 
Required CRDs are missing. Please install the corresponding CRD chart before installing this chart.

@richard-cox
Copy link
Member

@thehejik Would you be able to provide the environment from the description to @cnotv (no changes to the instance will be made)?

@cnotv
Copy link
Member

cnotv commented Sep 22, 2023

Error is weird though, what does it mean?

Screenshot 2023-09-22 at 22 12 02

@richard-cox
Copy link
Member

The error means a pre-req for the chart is missing, in this case it's cert manager (https://docs.epinio.io/installation/install_epinio#cert-manager). However.... depending on how you installed rancher it should be there already. So this one would take a bit of debugging / investigation to solve. Tomas might be able to help there, though he will have scripts / the process to bring up the env pretty quickly

@cnotv
Copy link
Member

cnotv commented Sep 27, 2023

Seems like the issue is related to the fact that we have 3 different names for "epinio" (versions are for this case):

  • cluster/rancher-charts/epinio: 102.0.4+up1.9.0
  • cluster/epinio-extension/epinio: 1.9.0-0
  • cluster/epinio-repo/epinio: 1.10.0

We also usually retrieve information from the annotations to define the ID (it's made of 3 parts) :

  • catalog.cattle.io/ui-source-repo-type
  • catalog.cattle.io/ui-source-repo

Are these information supposed to be always there @richard-cox?
@thehejik mentioned to have here a special case here.

Should we rather avoid to check updates if the values are not available?

@cnotv
Copy link
Member

cnotv commented Sep 27, 2023

As mentioned in the discussion today, we should also check if the issue is extended to the Chart installation/edit.

@zube zube bot modified the milestones: v2.8.0, v2.8.next1 Sep 28, 2023
@cnotv
Copy link
Member

cnotv commented Sep 29, 2023

As there's no parameter to match cases with malformed data, these cases will just not display available updates.

Another issue is related on how to compare 1.9.0 to 102.0.4+up1.9.0 as this is a semver standard format or any other format that I am aware of.

@cnotv
Copy link
Member

cnotv commented Oct 20, 2023

@richard-cox
Can we identify data wise what has been the issue for #9957? Behavior wise does not make it that clear from what we had earlier.
Since the issue has been tested based on data for another area, it would be great to have insight about the logic if any.

@cnotv
Copy link
Member

cnotv commented Dec 4, 2023

Due to lack of tests, documentations, comments, interface and data of any type we'll create a test for each case mentioned in #9958 for the views:

  • Installed Apps (/c/local/apps/catalog.cattle.io.app), missing update availability
  • Chart install (/c/local/apps/charts/install), current version not selected #9957
  • Kubewarden add repository function not working (tbd where and how) #518

Existing provided tests will be also reviewed due mentioned issues to have been already at least partially included:

@gaktive
Copy link
Member

gaktive commented Jun 21, 2024

Moving to 2.10.0 since we're getting close to feature complete for 2.9.0 and this feels like it can be merged after. However, if this is in a branch, that makes this less risky to merge.

@cnotv
Copy link
Member

cnotv commented Jun 21, 2024

Oh, this has been already completed but needs rebase, review comments, etc.
Not going to touch anytime soon I guess, till we not complete the Vue3 migration (or better said, rewrite the entire configuration)

@nwmac nwmac modified the milestones: v2.10.0, v2.11.0 Jul 4, 2024
@jamesharr
Copy link

FWIW, this issue affected a Longhorn upgrade. The original install was done using Helm, but the upgrade was accidentally done via UI. The net-effect was that persistent storage on our cluster crashed, we had to completely uninstall longhorn, re-install, and restore volumes from backup.

The inconsistency in install/upgrade methods was a mistake on our part, but it had a very outsized impact on a cluster considering there weren't any guards/warnings along the journey through in the UI. So in my opinion it raises the importance of this issue and I thought I'd mention it.

The UI Journey

The Rancher UI offers an upgrade

image

Only Rancher-catalog chart versions are presented during version selection. This is also the case after adding the Longhorn-project repository to Apps > Repositories.

image

More information about what went wrong

The root-cause of the Longhorn crash is that the Rancher-catalog uses two charts to separate out CRDs and resources (longhorn-crds + longhorn), while the Longhorn-project uses a single chart that includes CRDs and resources. The Rancher chart does offer some protections by checking for CRDs during template rendering. However, this protection only works on fresh installs and not for upgrades -- Helm will very happily see that the CRDs do in fact exist, finish the render process, then proceed to perform the upgrade which involves uninstalling things that don't exist in the new chart, the CRDs.

Upgrade Log

2024-08-27T21:44:33.870015408Z W0827 21:44:33.869790   	7 proxy.go:175] Request filter disabled, your proxy is vulnerable to XSRF attacks, please be cautious
2024-08-27T21:44:33.870799122Z Starting to serve on 127.0.0.1:8001
2024-08-27T21:44:34.824202451Z helm upgrade --history-max=5 --install=true --namespace=longhorn-system --timeout=10m0s --values=/home/shell/helm/values-longhorn-103.1.1-up1.4.4.yaml --version=103.1.1+up1.4.4 --wait=true longhorn /home/shell/helm/longhorn-103.1.1-up1.4.4.tgz
2024-08-27T21:44:38.641711402Z checking 23 resources for changes
2024-08-27T21:44:38.64790738Z Patch ServiceAccount "longhorn-service-account" in namespace longhorn-system
2024-08-27T21:44:38.655548835Z Patch ServiceAccount "longhorn-support-bundle" in namespace longhorn-system
2024-08-27T21:44:38.663810151Z Patch ConfigMap "longhorn-default-setting" in namespace longhorn-system
2024-08-27T21:44:38.674485147Z Patch ConfigMap "longhorn-storageclass" in namespace longhorn-system
2024-08-27T21:44:38.682916246Z Patch ClusterRole "longhorn-role" in namespace
2024-08-27T21:44:38.692027211Z Created a new ClusterRole called "longhorn-admin" in
2024-08-27T21:44:38.692062949Z nomsg
2024-08-27T21:44:38.698900942Z Created a new ClusterRole called "longhorn-edit" in
2024-08-27T21:44:38.698931791Z nomsg
2024-08-27T21:44:38.706384468Z Created a new ClusterRole called "longhorn-view" in
2024-08-27T21:44:38.706419484Z nomsg
2024-08-27T21:44:38.711144227Z Patch ClusterRoleBinding "longhorn-bind" in namespace
2024-08-27T21:44:38.71984195Z Patch ClusterRoleBinding "longhorn-support-bundle" in namespace
2024-08-27T21:44:38.728420498Z Patch Service "longhorn-backend" in namespace longhorn-system
2024-08-27T21:44:38.737750708Z Patch Service "longhorn-frontend" in namespace longhorn-system
2024-08-27T21:44:38.746673969Z Patch Service "longhorn-conversion-webhook" in namespace longhorn-system
2024-08-27T21:44:38.755335112Z Patch Service "longhorn-admission-webhook" in namespace longhorn-system
2024-08-27T21:44:38.763454581Z Patch Service "longhorn-recovery-backend" in namespace longhorn-system
2024-08-27T21:44:38.772507777Z Patch Service "longhorn-engine-manager" in namespace longhorn-system
2024-08-27T21:44:38.780178316Z Patch Service "longhorn-replica-manager" in namespace longhorn-system
2024-08-27T21:44:38.795556464Z Patch DaemonSet "longhorn-manager" in namespace longhorn-system
2024-08-27T21:44:38.809209217Z Patch Deployment "longhorn-driver-deployer" in namespace longhorn-system
2024-08-27T21:44:38.838076739Z Patch Deployment "longhorn-recovery-backend" in namespace longhorn-system
2024-08-27T21:44:38.854781477Z Patch Deployment "longhorn-ui" in namespace longhorn-system
2024-08-27T21:44:38.872395167Z Patch Deployment "longhorn-conversion-webhook" in namespace longhorn-system
2024-08-27T21:44:38.897482413Z Patch Deployment "longhorn-admission-webhook" in namespace longhorn-system
2024-08-27T21:44:38.907191912Z Deleting CustomResourceDefinition "backingimagedatasources.longhorn.io" in namespace ...
2024-08-27T21:44:38.923260026Z Deleting CustomResourceDefinition "backingimagemanagers.longhorn.io" in namespace ...
2024-08-27T21:44:38.944054085Z Deleting CustomResourceDefinition "backingimages.longhorn.io" in namespace ...
2024-08-27T21:44:38.962241561Z Deleting CustomResourceDefinition "backups.longhorn.io" in namespace ...
2024-08-27T21:44:38.990231401Z Deleting CustomResourceDefinition "backuptargets.longhorn.io" in namespace ...
2024-08-27T21:44:39.042664718Z Deleting CustomResourceDefinition "backupvolumes.longhorn.io" in namespace ...
2024-08-27T21:44:39.097535379Z Deleting CustomResourceDefinition "engineimages.longhorn.io" in namespace ...
2024-08-27T21:44:39.24090891Z Deleting CustomResourceDefinition "engines.longhorn.io" in namespace ...
2024-08-27T21:44:39.451364096Z Deleting CustomResourceDefinition "instancemanagers.longhorn.io" in namespace ...
2024-08-27T21:44:39.772294653Z Deleting CustomResourceDefinition "nodes.longhorn.io" in namespace ...
2024-08-27T21:44:39.927287798Z Deleting CustomResourceDefinition "orphans.longhorn.io" in namespace ...
2024-08-27T21:44:40.060627978Z Deleting CustomResourceDefinition "recurringjobs.longhorn.io" in namespace ...
2024-08-27T21:44:40.132828842Z Deleting CustomResourceDefinition "replicas.longhorn.io" in namespace ...
2024-08-27T21:44:40.193108852Z Deleting CustomResourceDefinition "settings.longhorn.io" in namespace ...
2024-08-27T21:44:40.244160625Z Deleting CustomResourceDefinition "sharemanagers.longhorn.io" in namespace ...
2024-08-27T21:44:40.28040071Z Deleting CustomResourceDefinition "snapshots.longhorn.io" in namespace ...
2024-08-27T21:44:40.320840857Z Deleting CustomResourceDefinition "supportbundles.longhorn.io" in namespace ...
2024-08-27T21:44:40.367652443Z Deleting CustomResourceDefinition "systembackups.longhorn.io" in namespace ...
2024-08-27T21:44:40.397909955Z Deleting CustomResourceDefinition "systemrestores.longhorn.io" in namespace ...
2024-08-27T21:44:40.434806304Z Deleting CustomResourceDefinition "volumes.longhorn.io" in namespace ...
2024-08-27T21:44:40.477347367Z beginning wait for 23 resources with timeout of 10m0s
2024-08-27T21:44:40.543548764Z Deployment is not ready: longhorn-system/longhorn-driver-deployer. 0 out of 1 expected pods are ready
... repeat messages
2024-08-27T21:54:00.707075985Z Deployment is not ready: longhorn-system/longhorn-driver-deployer. 0 out of 1 expected pods are ready
2024-08-27T21:54:02.701917798Z DaemonSet is not ready: longhorn-system/longhorn-manager. 2 out of 5 expected pods have been scheduled
2024-08-27T21:54:04.723281044Z Deployment is not ready: longhorn-system/longhorn-driver-deployer. 0 out of 1 expected pods are ready
... repeat messages
2024-08-27T21:54:38.705476089Z Deployment is not ready: longhorn-system/longhorn-driver-deployer. 0 out of 1 expected pods are ready
2024-08-27T21:54:40.519726173Z Error: UPGRADE FAILED: context deadline exceeded

@cnotv
Copy link
Member

cnotv commented Sep 3, 2024

There's a PR that should address this issue but has been left hanging due to priorities on other tech debts: #10180

@jamesharr
Copy link

There's a PR that should address this issue but has been left hanging due to priorities on other tech debts: #10180

Thanks for the reference and I understand. Is it worth re-posting my info over in #10180?

@cnotv
Copy link
Member

cnotv commented Sep 3, 2024

There's a PR that should address this issue but has been left hanging due to priorities on other tech debts: #10180

Thanks for the reference and I understand. Is it worth re-posting my info over in #10180?

No problem :)
The reference is fine here in the issue. I will have to investigate your specific case and eventually create a unit test if missing or add the parameters.

@richard-cox
Copy link
Member

Closing in favour of #11465, it's newer but contains more update to date repo instructions, SURE links and path forward

@richard-cox richard-cox closed this as not planned Won't fix, can't repro, duplicate, stale Sep 5, 2024
@nwmac nwmac modified the milestones: v2.12.0, v2.11.0 Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/charts kind/bug QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this size/2 Size Estimate 2 teams/mapps
Projects
None yet
8 participants