Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Optimal Fan Curves #180

Open
MilesBHuff opened this issue Apr 21, 2021 · 12 comments
Open

Discussion: Optimal Fan Curves #180

MilesBHuff opened this issue Apr 21, 2021 · 12 comments

Comments

@MilesBHuff
Copy link

MilesBHuff commented Apr 21, 2021

HEATUP/COOLDOWN

The rapid and sudden spin-ups of fans on System76 laptops seems to be a common complaint in reviews for some models. Thankfully, this should be somewhat largely addressed by #190; still, it may be worth deliberating on what the optimal values for HEATUP and COOLDOWN are, since they can play a large role in this.

When they're too low, the fans constantly spin up and down as the CPU fluctuates, creating a lot of distracting noise (as described by @leviport in #38 (comment));
when HEATUP is too high, the CPU fan will take too long to adjust (thus creating a jarring and sudden transition from a lower speed to a higher one); and
when COOLDOWN is too high, the fan can blow loudly and noisily for long durations after the CPU has already cooled down.

I know that HEATUP=5 and COOLDOWN=10 seem to work well on the oryp7 (The defaults are 5 and 20, respectively; 20 is definitely too long, and leaves the fans at max for 10+ seconds after the system has already cooled down.); but I don't have a good logical argument for why these are good -- only a qualitative one.
Other models, such as the darp5, seem to do best with very low values for these (see: #135).

Is there some sort of criteria that can be used to determine what values should be ideal, in principle?

Bounds and interpolation

Once #190 is merged, fan curves can be simplified a fair bit, as intermediate values can then be interpolated, from what I understand. This means that the only values that will need to be explicitly mentioned in the board.mk files, are the ones that have very specific reasons to be there.

Reading the second half of #163, it seems some fans have a lower bound for useful percentages. The oryp7, for example, has to run its fans at at least 25% or many users will experience rumbling vibrations. So maybe the lower bounds for each model could be set to the lowest percentage that works without issue?

For fans that have a broken value in the middle of their ranges (let's say Fan A rattles only from 51%-53%), two nodes could be specified, differing by only 1'C, to exclude this range: 60'C:50%::61'C:54%.

The highest value should always be 100%, unless that speed is broken for a given model's fans.
Currently, most System76 models seem to have this placed at around 90'C. While this may seem like a worthwhile tradeoff of CPU lifetime vs noise, it comes at more costs than just lifetime: When the computer does not spin the fans to max until the CPU is throttling itself extremely heavily (which it is doing at 90'C), the computational power of the computer takes a hit; and this hit can be substantial. And running at high temperatures does more than just shorten the CPU's lifespan -- it can also decrease the chip's reliability, as the fragile transistors are directly impacted by high temperatures.
All computers should reach their maximum fan speeds when they reach Tjunction, the point at which the CPU has to start throttling. This value is different for all processors, and the exact specs can be found at Intel's website (https://www.intel.com/content/www/us/en/support/articles/000005597/processors.html); but generally, hitting 100% fan at 70'C will avoid throttling on most or all CPUs, from what I understand.

But what about the temperature at which the fan should turn on? I'm unsure of an objective way to determine where this should be; but 50'C seems reasonable for laptops.

For the oryp7, the above gets us an interpolated fan curve with just two nodes: 25% @ 50'C, and 100% @ 70'C. You can find these settings (with hard-coded interpolation as a temporary workaround until #190 is merged) at my pull request, #179.

Closing remarks

I just want to say, I think you guys are awesome and thank you thank you thank you for working so hard to give us open source firmware. Like, honestly, the simple fact that I can even discuss the BIOS with the devs on GitHub and tweak the code for my own uses is just amazing.

I'm thinking through all of the above to improve the fan settings and fix the issues the defaults cause on my personal laptop; and I'm sharing my thought process here in case it's of some help, either to System76 or other power-users seeking to get the most out of their new laptops.
If nothing else, though, I really encourage you guys to consider lowering the temperature at which fans reach full speed -- 90'C is definitely too high.

Thanks for your time.

@curiousercreative
Copy link
Contributor

@MilesBHuff responding just to the 90C remarks, I felt the same way initially and I adjusted my fan curve to something like you describe; spin those fans early and hit 100% between 70-80C. I ran with that fan curve for a bit and thought it was good. Then, I embarked on a journey to understand the power limits (and how they relate to thermal limits) of my new i7-1165G7 and that led me here and ultimately to understanding the pop-os s76 power profiles and their intended use. The fan curves actually correspond to the power profiles. Battery profile (for my device) places a thermal limit of 68C while the floor of fans even spinning is 70C. I can't find the thread now, but I've seen @jackpot51 or another engineer write that battery profile should be silent. Balanced profile thermal limits to 88C which would allow for 90% fan and then you must use performance profile to get to 100% fan use.

So for some time now, I've been running this fan curve to make battery profile function as intended and to keep fan speeds low on balanced when I'm using more casually and don't mind throttling if it keeps my baby from waking up.

Worth checking out as well

@kolya182
Copy link

kolya182 commented Apr 22, 2021

Fun spikes are really annoying and distracting on lemp10. Here is some examples from my machine
https://www.dropbox.com/sh/053qfpkuvioopab/AADGAKeTD5WtrR7hVK46duE2a?dl=0

Distribution (run cat /etc/os-release):
NAME="Pop!_OS"
VERSION="20.10"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 20.10"
VERSION_ID="20.10"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=groovy
UBUNTU_CODENAME=groovy

2021-04-21 12 53 50

@curiousercreative
Copy link
Contributor

curiousercreative commented Apr 22, 2021

@kolya182 care to re-run your tests with an attempt to fix such a problem? #139. You'd need to build from the branch in pull request and add these build flags to lemp10/board.mk before building/flashing: https://github.com/curiousercreative/ec/blob/galp5/src/board/system76/galp5/board.mk#L44

@kolya182
Copy link

kolya182 commented Apr 23, 2021

@kolya182 care to re-run your tests with an attempt to fix such a problem? #139. You'd need to build from the branch in pull request and add these build flags to lemp10/board.mk before building/flashing: https://github.com/curiousercreative/ec/blob/galp5/src/board/system76/galp5/board.mk#L44

I can't I use my machine for work, and only will be able to install official software update releases. @jackpot51 do you have a time frame on when changes to fan curves will be released ?. I listened to your convo on podcast when you mentioned work on fan curves and that when I made final decision on getting system76 laptop. Is it normal for the processor to be in the range from 62-76C at minimum load ?
IMG_2685

@MilesBHuff
Copy link
Author

@curiousercreative (#180 (comment))
What if System76 were to provide two fan curves per laptop? One that uses Tjunction for 100%, and one that does as you're describing. "Performance", and "Quiet".

@MilesBHuff
Copy link
Author

@kolya182 Your comments are off-topic here; please post them in #139.

@curiousercreative
Copy link
Contributor

@curiousercreative (#180 (comment))
What if System76 were to provide two fan curves per laptop? One that uses Tjunction for 100%, and one that does as you're describing. "Performance", and "Quiet".

Sure, but at that level of effort it'd make sense to expose fan speed control in the OS. My old Mac Pro has fixed (conservative) EC fan speed control in addition to OS controlled. It'll take the maximum speed of what OS and EC request.

@MilesBHuff
Copy link
Author

@curiousercreative I feel like that's probably even more involved, but I'd definitely support that. There's an issue for it here: pop-os/system76-acpi-dkms#9, which I see you already thumbsed-up -- thanks!
System76 could then provide a choice of profiles in userland, and default to quiet as they currently prefer to do.

@Raikiri
Copy link

Raikiri commented Jun 9, 2021

I definitely feel like there can not be a universal fan curve that all users will be happy with. Some users have sleeping kids and they want their fans to run as quiet as possible. Other users don't care, they just buy a bunch of spare fans pre-emptively and run aggressive cooling profiles: if it kills the fans earlier, they just replace them.

So it's great that we have already raised an issue to control the curves from pop-os, but I feel like it needs to be higher-level. One should be able to control those curves in runtime from any OS (or at least from bios).

@MilesBHuff
Copy link
Author

MilesBHuff commented Jun 12, 2021

@Raikiri

I definitely feel like there can not be a universal fan curve that all users will be happy with.

Of course. But I think the option of a performance curve vs a quiet curve would cover 90% of users. And, indeed, it is commonplace in the industry for such an option to exist in the BIOS.
In the absence of such an option, the defaults should strike a balance that compromises on as little as possible.

Other users don't care [...]: if it kills the fans earlier, they just replace them.

FWIW, what I recommend above doesn't run the fans 100% of the time -- just when needed.

I feel like it needs to be higher-level. One should be able to control those curves in runtime from any OS

pop-os/system76-acpi-dkms#9 would work from any Linux distro, not just Pop!_OS. In fact, OP's distro is Manjaro.

@Raikiri

This comment was marked as off-topic.

@crawfxrd
Copy link
Member

crawfxrd commented Aug 1, 2024

My long term plan is now to move most of the logic to system firmware:

  • EC only implements mechanisms
    • Fans turn on with some non-zero value (with delay?), in case of system hang before OS (coreboot or edk2 dies)
    • Set fan duty, based on target RPM?
      • Presumably what Clevo does since fan tables no longer use PWM duty and only use RPM?
      • Should check what Intel DPTF uses
    • Report fan PWM duty and RPM (fixed with Add support for second fan without a dGPU firmware-open#563)
  • System firmware implements the policies
    • edk2 runtime driver?
    • "Auto/Manual" (firmware/OS) fan control setting
      • Maybe even handle max fan toggle in system firmware instead of EC?
    • Profiles
      • Clevo provides sample fan profiles such as: Entertainment, Performance, Quiet
      • Not really required if it's possible to configure via OS driver; just need 1

Ref: system76/firmware-open#571

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants