Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random shutdowns #608

Open
2 tasks
pablomendezroyo opened this issue Jan 22, 2024 · 0 comments
Open
2 tasks

Random shutdowns #608

pablomendezroyo opened this issue Jan 22, 2024 · 0 comments

Comments

@pablomendezroyo
Copy link
Contributor

pablomendezroyo commented Jan 22, 2024

Some users have been reporting random shutdowns on their dappnodes already from more than 1 year ago. This issue is happening on dappnodes sold by dappnode, which is mainly Intel NUC hardware. The OS installed in dappnode has been Debian: Buster, Bullseye and Bookworm.

Reasons

The main reasons why a random shutdown may occur in dappnode are:

  • Overheating: If the CPU or GPU overheats, the system will shut down to prevent damage. This can be due to dust buildup, failing fans, poor ventilation, or thermal paste that needs replacing.

  • Power Supply Issues: A faulty or inadequate power supply can cause unexpected shutdowns. This could be due to power fluctuations, an aging power supply unit, or issues with the power source.

  • Hardware Failure: Failing hardware components like RAM, the motherboard, or hard drives/SSDs can lead to shutdowns. Testing with a hardware diagnostic tool or swapping out components can help identify the issue.

  • Software and Drivers: Missing or outdated drivers, particularly for the chipset and graphics card, can cause instability. Ensure all drivers are up to date. Also, check for any system updates or patches that might address stability issues.

  • Kernel Panic: A kernel panic in Linux could lead to a sudden shutdown. This can be caused by hardware incompatibilities, driver issues, or corrupted system files. Checking system logs can provide insights into whether this is happening.

  • BIOS/UEFI Issues: An outdated or misconfigured BIOS/UEFI firmware can cause stability issues. Check for firmware updates and ensure settings are correct for your hardware.

  • Memory Issues: Faulty or incompatible RAM can cause random shutdowns. Running a memory test like Memtest86 can help diagnose this.

Debugging

In order to debug a broken dappnode the following info should be collected:

  • OS and OS version: i.e Debian Bullseye
  • Hardware model and version: Intel NUC 13 generation
  • Dappnode version i.e 0.2.85
  • Error on dappnode install logs:
    • /usr/src/dappnode/logs/dappnode_install.log
    • /usr/src/dappnode/logs/iso_install.log
  • System logs:
    • /var/log/syslog
    • dmesg
  • Intel linux drivers
    • intel-microcode
    • ?
  • Temperature: collect the temperature of the different components of the intel NUC to detect non normal values
  • Count of shutdowns journalctl --list-boots | wc -l
  • Memory issues: this should be a last step to be done since it will erase data. Use the tool Memtest86

To do`s

  • Install temperature sensors drivers by default on dappnode
  • Script to collect info

Alternatives

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant