Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print cloud-init logs when a deployment fails #1060

Open
srbarrios opened this issue Mar 4, 2022 · 8 comments
Open

Print cloud-init logs when a deployment fails #1060

srbarrios opened this issue Mar 4, 2022 · 8 comments

Comments

@srbarrios
Copy link
Member

srbarrios commented Mar 4, 2022

@rjmateus Can we print cloud-init logs when a deployment fails in that way?

Maybe we must contribute in the libvirt provider for it? Somewhere here?
https://github.com/dmacvicar/terraform-provider-libvirt/blob/main/libvirt/resource_libvirt_domain.go
https://github.com/dmacvicar/terraform-provider-libvirt/blob/main/libvirt/resource_libvirt_cloud_init.go
https://github.com/dmacvicar/terraform-provider-libvirt/blob/main/libvirt/cloudinit_def.go

The idea is to append it to this error message:

�[31m│�[0m �[0m�[1m�[31mError: �[0m�[0m�[1mError: couldn't retrieve IP address of domain.Please check following: 
�[31m│�[0m �[0m1) is the domain running proplerly? 
�[31m│�[0m �[0m2) has the network interface an IP address? 
�[31m│�[0m �[0m3) Networking issues on your libvirt setup? 
�[31m│�[0m �[0m 4) is DHCP enabled on this Domain's network? 
�[31m│�[0m �[0m5) if you use bridge network, the domain should have the pkg qemu-agent installed 
�[31m│�[0m �[0mIMPORTANT: This error is not a terraform libvirt-provider error, but an error caused by your KVM/libvirt infrastructure configuration/setup 
�[31m│�[0m �[0m timeout while waiting for state to become 'all-addresses-obtained' (last state: 'waiting-addresses', timeout: 5m0s)�[0m
�[31m│�[0m �[0m
�[31m│�[0m �[0m�[0m  with module.cucumber_testsuite.module.debian-minion.module.minion.module.host.libvirt_domain.domain[0],
�[31m│�[0m �[0m  on /home/jenkins/jenkins-build/workspace/manager-Head-dev-acceptance-tests-NUE/results/sumaform/backend_modules/libvirt/host/main.tf line 68, in resource "libvirt_domain" "domain":
�[31m│�[0m �[0m  68: resource "libvirt_domain" "domain" �[4m{�[0m�[0m
@nodeg
Copy link
Member

nodeg commented Mar 4, 2022

You mean that we have colored output?

@srbarrios
Copy link
Member Author

srbarrios commented Mar 4, 2022

You mean that we have colored output?

hahaha nono
I want the files /var/log/cloud-init.log and /var/log/cloud-init-output.log printed as part of the sumaform deployment, together with the message that I shared.

These files can give as very useful information about the deployment failure.

(I will not oppose to colors in any case 😆 )

@srbarrios
Copy link
Member Author

See an example of the information they provide:

Cloud-init v. 20.2-8.48.1 running 'modules:config' at Mon, 28 Feb 2022 10:09:23 +0000. Up 32.66 seconds.
Retrieving repository 'os_pool_repo' metadata [.done]
Building repository 'os_pool_repo' cache [....done]
All repositories have been refreshed.
Loading repository data...
Reading installed packages...
'qemu-guest-agent' is already installed.
Package 'qemu-guest-agent' is not available in your repositories. Cannot reinstall, upgrade, or downgrade.
Resolving package dependencies...

The following 5 NEW packages are going to be installed:
  avahi libavahi-common3 libavahi-core7 libdaemon0 nss-mdns

5 new packages to install.
Overall download size: 348.8 KiB. Already cached: 0 B. After the operation, additional 894.7 KiB will be used.
Continue? [y/n/v/...? shows all options] (y): y
Retrieving package libavahi-common3-0.8-150400.4.4.x86_64 (1/5),  42.3 KiB ( 51.2 KiB unpacked)
Retrieving: libavahi-common3-0.8-150400.4.4.x86_64.rpm [done]
Retrieving package libdaemon0-0.14-1.23.x86_64 (2/5),  29.4 KiB ( 59.4 KiB unpacked)
Retrieving: libdaemon0-0.14-1.23.x86_64.rpm [done]
Retrieving package libavahi-core7-0.8-150400.4.4.x86_64 (3/5), 101.5 KiB (220.8 KiB unpacked)
Retrieving: libavahi-core7-0.8-150400.4.4.x86_64.rpm [done]
Retrieving package avahi-0.8-150400.4.4.x86_64 (4/5), 136.0 KiB (431.2 KiB unpacked)
Retrieving: avahi-0.8-150400.4.4.x86_64.rpm [done]
Retrieving package nss-mdns-0.14.1-150400.8.3.x86_64 (5/5),  39.7 KiB (132.0 KiB unpacked)
Retrieving: nss-mdns-0.14.1-150400.8.3.x86_64.rpm [done]

Checking for file conflicts: [.......done]
(1/5) Installing: libavahi-common3-0.8-150400.4.4.x86_64 [.....done]
(2/5) Installing: libdaemon0-0.14-1.23.x86_64 [.....done]
(3/5) Installing: libavahi-core7-0.8-150400.4.4.x86_64 [..........done]
(4/5) Installing: avahi-0.8-150400.4.4.x86_64 [...........done]
Additional rpm output:
Updating /etc/sysconfig/avahi ...
Created symlink /etc/systemd/system/dbus-org.freedesktop.Avahi.service -> /usr/lib/systemd/system/avahi-daemon.service.
Created symlink /etc/systemd/system/multi-user.target.wants/avahi-daemon.service -> /usr/lib/systemd/system/avahi-daemon.service.
Created symlink /etc/systemd/system/sockets.target.wants/avahi-daemon.socket -> /usr/lib/systemd/system/avahi-daemon.socket.


(5/5) Installing: nss-mdns-0.14.1-150400.8.3.x86_64 [..........done]
Failed to start qemu-ga@virtio\x2dports-org.qemu.guest_agent.0.service: Unit qemu-ga@virtio\x2dports-org.qemu.guest_agent.0.service not found.
Cloud-init v. 20.2-8.48.1 running 'modules:final' at Mon, 28 Feb 2022 10:09:25 +0000. Up 34.03 seconds.
2022-02-28 10:09:31,188 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [5]
2022-02-28 10:09:31,195 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2022-02-28 10:09:31,196 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.6/site-packages/cloudinit/config/cc_scripts_user.py'>) failed
ci-info: no authorized SSH keys fingerprints found for user sles.
Cloud-init v. 20.2-8.48.1 finished at Mon, 28 Feb 2022 10:09:31 +0000. Datasource DataSourceNoCloud [seed=/dev/sr0][dsmode=net].  Up 40.28 seconds

@rjmateus
Copy link
Member

@srbarrios we don't control the log from the sumaform side. That would need changes to the terraform libvirt provider.
I'm not sure if this issue should be reported here. Upstream may not accept a change like this, because this is not a domain creation problem, is a cloud-init problem, inside the machine (I don't even know if the provider can get data from the disk inside the machine).
As far as I know, all data the provides retrieve is obtained from libvirt.

@moio
Copy link
Contributor

moio commented Mar 14, 2022

@rjmateus it's true we do not control it, still we could just echo it from salt script (if ever we get to that stage)

@rjmateus
Copy link
Member

@moio Good point. Do you think we should echo that always, and should by the first state to apply (since cloud-init runs at start-up)?

@moio
Copy link
Contributor

moio commented Mar 16, 2022

Please try and take a look, if you find it too verbose it could be behind a flag.

@Bischoff
Copy link
Contributor

I'm not sure this is possible at libvirt provider side: it has no idea whether cloud-init is used or not.

Even with the assumption that cloud-init is used, the provider would need access inside the VM to check for the logs. And we are precisely in a situation where it is impossible to access inside the VM, be it via the network or via qemu-agent...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants