Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fallback presents a scary message to users #418

Open
jprvita opened this issue Sep 21, 2021 · 4 comments · May be fixed by #445
Open

fallback presents a scary message to users #418

jprvita opened this issue Sep 21, 2021 · 4 comments · May be fixed by #445

Comments

@jprvita
Copy link
Contributor

jprvita commented Sep 21, 2021

In the past, before fallback cared about TPMs, it would always chain-load the entry it had created, but to better support TPMs it now reboots the system if a TPM is present. Due to some firmware implementations overriding any changes made outside of the firmware, the system may end-up in a reboot loop of BOOTX64.EFI, as already reported by @lcp on #128.

This problem is currently addressed by commit a5db51a, which presents a screen warning the user the system is about to reboot, with a countdown, allowing them to tell the system to keep booting instead. While this solution works (and thanks @lcp for implementing it!), it has a few shortcomings:

  1. It makes an otherwise glitch-free boot process not smooth anymore;
  2. The message presented is not accessible / potentially scary for non-technical users: if they press a key to interrupt the boot process, the meaning of each option is not really clear for users not familiar with how shim / fallback work;
  3. The whole experience is made a bit worse by the fact that after selecting "Continue boot" / "Always continue boot", the screen will remain frozen until something else draw on the framebuffer. If GRUB is configured to be quiet for a glitch-free boot, this may last several seconds until the kernel has started and loaded the manufacturer logo from BGRT, which gives the impression that the whole boot process froze.
  4. This Boot Option Restoration screen overwrites all the debug information printed before it is displayed, essentially neutering FALLBACK_VERBOSE or SHIM_VERBOSE and making it impossible to enable debug without rebuilding fallback.

Some of these may be seen as non-issues for distributions that use a traditional installer, like Fedora and OpenSuse, where the correct boot entry is created before first boot by the installer and the fallback path is only taken if the boot entry got invalidated somehow (ESP UUID has changed, firmware was factory-reset etc). But for distributions that are distributed as "ready-to-dd" disk image, like Endless OS, fallback is responsible for creating the boot entry on first boot, and then update it on the second boot after the ESP UUID changes during the first-boot re-partitioning process (since we don't want the partitions on every installation to have the same UUIDs). This means users will see this message on the first and second boots in such scenarios.

jprvita added a commit to endlessm/shim that referenced this issue Sep 21, 2021
The firmware on some Acer machines (and maybe others) always resets the
boot entries and BootOrder variable to what was defined in the firmware
setup program, overriding any external changes (including the changes
made by fallback).

Before shim cared about TPMs this was not a problem in practice, as
fallback would create and chain-load a boot entry for the OS on every
boot. However, since commit 431b8a2 the system is restarted if a TPM
is detected on the system, triggering an infinite reboot loop in systems
with such firmware. This is a known problem which has been previously
reported on rhboot#128

More recently, the problem has been addressed by commit a5db51a,
which presents a screen with a countdown to the user, where they can
interrupt boot and choose to have fallback always chain-load the new
entry instead of restarting the system, to break out of the reboot loop.
While this solution works, it has a few shortcomings:

 1. It makes an otherwise glitch-free boot process not smooth anymore.
 2. The message presented is not accessible / potentially scary for
    non-technical users: if they press a key to interrupt the boot
    process, the meaning of each option is not really clear for users
    not familiar with how shim and fallback work.
 3. The whole experience is made a bit worse by the fact that after
    selecting "Continue boot" / "Always continue boot", the screen will
    remain frozen until something else draw on the framebuffer. If GRUB
    is configured to be quiet, for a glitch-free boot, this may last
    several seconds until the kernel has started and loaded the
    manufacturer logo from BGRT, which gives the impression that the
    whole boot process froze.
 4. This Boot Option Restoration screen overwrites all the debug
    information printed before it is displayed, essentially neutering
    FALLBACK_VERBOSE or SHIM_VERBOSE and making it impossible to enable
    debug without rebuilding fallback.

This commit tries to automatically detect and break out of the reboot
loop without requiring any user interaction. To achieve this, a boot
counter is stored in an EFI variable and incremented every time fallback
is about to reboot the system. If the counter ever reaches a maximum
value configurable at build time (currently default to 3), another EFI
variable is set to tell fallback to always chain-load the new entry
(FB_NO_REBOOT, to make is backwards compatible with the previous
solution). The counter is then reset when shim is started and knowns it
is not going to load fallback.

Fixes: rhboot#418

Signed-off-by: João Paulo Rechi Vita <[email protected]>
jprvita added a commit to endlessm/shim that referenced this issue Sep 21, 2021
The firmware on some Acer machines (and maybe others) always resets the
boot entries and BootOrder variable to what was defined in the firmware
setup program, overriding any external changes (including the changes
made by fallback).

Before shim cared about TPMs this was not a problem in practice, as
fallback would create and chain-load a boot entry for the OS on every
boot. However, since commit 431b8a2 the system is restarted if a TPM
is detected on the system, triggering an infinite reboot loop in systems
with such firmware. This is a known problem which has been previously
reported on rhboot#128

More recently, the problem has been addressed by commit a5db51a,
which presents a screen with a countdown to the user, where they can
interrupt boot and choose to have fallback always chain-load the new
entry instead of restarting the system, to break out of the reboot loop.
While this solution works, it has a few shortcomings:

 1. It makes an otherwise glitch-free boot process not smooth anymore.
 2. The message presented is not accessible / potentially scary for
    non-technical users: if they press a key to interrupt the boot
    process, the meaning of each option is not really clear for users
    not familiar with how shim and fallback work.
 3. The whole experience is made a bit worse by the fact that after
    selecting "Continue boot" / "Always continue boot", the screen will
    remain frozen until something else draw on the framebuffer. If GRUB
    is configured to be quiet, for a glitch-free boot, this may last
    several seconds until the kernel has started and loaded the
    manufacturer logo from BGRT, which gives the impression that the
    whole boot process froze.
 4. This Boot Option Restoration screen overwrites all the debug
    information printed before it is displayed, essentially neutering
    FALLBACK_VERBOSE or SHIM_VERBOSE and making it impossible to enable
    debug without rebuilding fallback.

This commit tries to automatically detect and break out of the reboot
loop without requiring any user interaction. To achieve this, a boot
counter is stored in an EFI variable and incremented every time fallback
is about to reboot the system. If the counter ever reaches a maximum
value configurable at build time (currently default to 3), another EFI
variable is set to tell fallback to always chain-load the new entry
(FB_NO_REBOOT, to make is backwards compatible with the previous
solution). The counter is then reset when shim is started and knowns it
is not going to load fallback.

Fixes: rhboot#418

Signed-off-by: João Paulo Rechi Vita <[email protected]>
jprvita added a commit to endlessm/shim that referenced this issue Sep 21, 2021
The firmware on some Acer machines (and maybe others) always resets the
boot entries and BootOrder variable to what was defined in the firmware
setup program, overriding any external changes (including the changes
made by fallback).

Before shim cared about TPMs this was not a problem in practice, as
fallback would create and chain-load a boot entry for the OS on every
boot. However, since commit 431b8a2 the system is restarted if a TPM
is detected on the system, triggering an infinite reboot loop in systems
with such firmware. This is a known problem which has been previously
reported on rhboot#128

More recently, the problem has been addressed by commit a5db51a,
which presents a screen with a countdown to the user, where they can
interrupt boot and choose to have fallback always chain-load the new
entry instead of restarting the system, to break out of the reboot loop.
While this solution works, it has a few shortcomings:

 1. It makes an otherwise glitch-free boot process not smooth anymore.
 2. The message presented is not accessible / potentially scary for
    non-technical users: if they press a key to interrupt the boot
    process, the meaning of each option is not really clear for users
    not familiar with how shim and fallback work.
 3. The whole experience is made a bit worse by the fact that after
    selecting "Continue boot" / "Always continue boot", the screen will
    remain frozen until something else draw on the framebuffer. If GRUB
    is configured to be quiet, for a glitch-free boot, this may last
    several seconds until the kernel has started and loaded the
    manufacturer logo from BGRT, which gives the impression that the
    whole boot process froze.
 4. This Boot Option Restoration screen overwrites all the debug
    information printed before it is displayed, essentially neutering
    FALLBACK_VERBOSE or SHIM_VERBOSE and making it impossible to enable
    debug without rebuilding fallback.

This commit tries to automatically detect and break out of the reboot
loop without requiring any user interaction. To achieve this, a boot
counter is stored in an EFI variable and incremented every time fallback
is about to reboot the system. If the counter ever reaches a maximum
value configurable at build time (currently default to 3), another EFI
variable is set to tell fallback to always chain-load the new entry
(FB_NO_REBOOT, to make is backwards compatible with the previous
solution). The counter is then reset when shim is started and knows it
is not going to load fallback.

Fixes: rhboot#418

Signed-off-by: João Paulo Rechi Vita <[email protected]>
@jprvita
Copy link
Contributor Author

jprvita commented Sep 21, 2021

To avoid the problems mentioned above, we have been shipping a downstream change on Endless OS to always try to chain-load the new boot entry, despite the presence of a TPM. But I am now trying to improve things upstream through #419.

@dsd
Copy link

dsd commented Sep 22, 2021

It's touched upon in the issues linked here, but I'd like to also emphasise the scope of the problem: we believe that a large range of consumer products will always execute bootx64, effectively ignoring BootXXXX & BootOrder variables, unless the desired binary is manually enrolled as trusted within the firmware boot menu. Such systems by default encounter the infinite boot loop mentioned above.

We had multiple generations of Acer products in our office that got affected by this, getting stuck in this loop when the TPM-related reboot behaviour was introduced to shim. There are many reports around the web of this behaviour e.g. this big Fedora bug report. Since the problem is widespread, any improvement to the user experience beyond the blue screen menu would be very valuable.

@vathpela
Copy link
Contributor

I don't think this is the right solution, especially the mechanism for detecting boot loops. If we hit boot loops, it's very often because variables aren't working. Maybe just boot variables because of some horribly broken BDS, maybe all variables. Going forward, that'll especially be true with EBBR-style devices that have no NV storage, and various cloud setups that are similar in that regard.

The right thing to do here is to fully integrate fallback into shim, and completely get rid of it as a separate program. That alleviates the TPM issue[0], and lets us switch back to always doing one of:

  • booting the second stage in shim's directory
  • booting the created entry
  • definitively failing with no further steps forward

[0] we also need to do this with mokmanager, which will also help dramatically improve that experience

jprvita added a commit to endlessm/shim that referenced this issue Feb 1, 2022
The firmware on some Acer machines (and maybe others) always resets the
boot entries and BootOrder variable to what was defined in the firmware
setup program, overriding any external changes (including the changes
made by fallback).

Before shim cared about TPMs this was not a problem in practice, as
fallback would create and chain-load a boot entry for the OS on every
boot. However, since commit 431b8a2 the system is restarted if a TPM
is detected on the system, triggering an infinite reboot loop in systems
with such firmware. This is a known problem which has been previously
reported on rhboot#128

More recently, the problem has been addressed by commit a5db51a,
which presents a screen with a countdown to the user, where they can
interrupt boot and choose to have fallback always chain-load the new
entry instead of restarting the system, to break out of the reboot loop.
While this solution works, it has a few shortcomings:

 1. It makes an otherwise glitch-free boot process not smooth anymore.
 2. The message presented is not accessible / potentially scary for
    non-technical users: if they press a key to interrupt the boot
    process, the meaning of each option is not really clear for users
    not familiar with how shim and fallback work.
 3. The whole experience is made a bit worse by the fact that after
    selecting "Continue boot" / "Always continue boot", the screen will
    remain frozen until something else draw on the framebuffer. If GRUB
    is configured to be quiet, for a glitch-free boot, this may last
    several seconds until the kernel has started and loaded the
    manufacturer logo from BGRT, which gives the impression that the
    whole boot process froze.
 4. This Boot Option Restoration screen overwrites all the debug
    information printed before it is displayed, essentially neutering
    FALLBACK_VERBOSE or SHIM_VERBOSE and making it impossible to enable
    debug without rebuilding fallback.

This commit adds a build-time flag that forces fallback to always try to
chain-load the newly created boot entry, in the same way it did before
TPM support was added.

Fixes: rhboot#418

Signed-off-by: João Paulo Rechi Vita <[email protected]>
@jprvita
Copy link
Contributor Author

jprvita commented Feb 1, 2022

Thanks for the feedback here @vathpela, I have now closed #419 per the potential issues you mentioned.

I agree that the right solution is to integrate fallback and mokmanager into shim, and I see you have that planned for shim 16.

For shim 15.x, we are going to continue to avoid the logic that asks the user what to do in those scenarios at compile time. I have put our changes behind a build-time flag and opened #445, in case other distributors also want to follow that route (and for us, the fewer downstream changes we have the easier to go through shim-review). But I understand you may not want to accept since the architecture of fallback is about to change, in which case feel free to just close this as wontfix and we'll keep carrying it downstream until it is time to switch to fallback-inside-shim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants