Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: revert Windows change to boot-time timers #35482

Closed
networkimprov opened this issue Nov 9, 2019 · 130 comments
Closed

runtime: revert Windows change to boot-time timers #35482

networkimprov opened this issue Nov 9, 2019 · 130 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. OS-Windows release-blocker
Milestone

Comments

@networkimprov
Copy link

networkimprov commented Nov 9, 2019

An engineering lead on the Windows Base team (kernel, fs, etc) asked us to revert d85072, from #31528, because it changed Windows timers to advance during sleep; everywhere else Go has monotonic timers (see also #24595 #35012).

Quoting @jstarks from #35447 (comment)

The Windows kernel team changed timer behavior in Windows 8 to stop advancing relative timeouts on wake. Otherwise when you open your laptop lid, every timer in the system goes off all at once and you get a bunch of unpredictable errors. Software is generally written to assume that local processes will make forward progress over reasonable time periods, and if they don't then something is wrong. When the machine is asleep, this assumption is violated. By making relative timers behave like threads, so that they both run together or they both don't, the illusion is maintained. You can claim these programs are buggy, but they obviously exist. Watchdog timers are well-known constructs.

This was a conscious design decision in Windows, and so it's disappointing to see the Go runtime second guess this several years later in a bug fix.

@alexbrainman suggested an alternative approach to fixing the reported issue, via QueryUnbiasedInterruptTime() in #31528 (comment). Let's try to adopt that for 1.14.

We should backport that to 1.12 & 1.13, also reverting the commit which landed in 1.13.3, see #34130.

cc @ianlancetaylor @rsc @aclements @zx2c4 @jmontgomery-jc
@gopherbot add OS-Windows
@gopherbot add release-blocker

@ianlancetaylor
Copy link
Contributor

Is anybody working on the alternative approach mentioned above?

@alexbrainman
Copy link
Member

Is anybody working on the alternative approach mentioned above?

No, I am not making this change.

@zx2c4 said in #31528 (comment)

It's impossible to implement WireGuard securely if timers don't take into account sleep time.

CL 191957 fixes this problem. If we revert the CL and replace it with QueryUnbiasedInterruptTime, the problem will reappear.

Also using QueryUnbiasedInterruptTime will make nanotime implementation about 2 times slower. See #31528 (comment) for details.

Alex

@jstarks
Copy link

jstarks commented Nov 15, 2019

I can find some time to make the change that Alex originally prototyped if no one else is available. But in the meantime, a patch release of Go has regressed programs running in containers, so should we consider reverting this and then apply a more appropriate change later?

It's impossible to implement WireGuard securely if timers don't take into account sleep time.

CL 191957 fixes this problem. If we revert the CL and replace it with QueryUnbiasedInterruptTime, the problem will reappear.

As I understand it, Go on Linux has the same behavior as Go on Windows used to have before this change. WireGuard is specialized software and may have to work around this in both Windows and Linux.

I also think the fix in CL 191957 is incomplete--AFAIK, PowerRegisterSuspendResumeNotification only provides notifications for machines transitioning across classic sleep states, not when machines enter connected standby (which is used instead of sleep in some newer devices). In these cases, you will still see a difference between (biased) interrupt time and WaitForSingleObject's relative timers, so WireGuard presumably will still run into problems.

The right fix for WireGuard may be to offer a new kind of timer that uses absolute (wall clock) timeouts on Windows, which is affected by changes to system UTC time (NTP or otherwise) but not by sleep states. If that's insufficient, I can help investigate if there are other options that might be appropriate.

Also using QueryUnbiasedInterruptTime will make nanotime implementation about 2 times slower. See #31528 (comment) for details.

If slowing nanotime down from 2ns to 4ns is problematic, we can look at whether the internal definition of QueryUnbiasedInterruptTime is stable enough to inline into Go's runtime (which is what was done for the current version of nanotime: it was apparently cloned from QueryInterruptTime). I'd really like to avoid this, though, because the current definition relies on a private export from ntdll to the kernel that is not part of the external API or ABI.

@networkimprov
Copy link
Author

This patch wouldn't have been accepted if the discussion about it had referenced #24595.

@zx2c4 said in #31528 (comment)

It's impossible to implement WireGuard securely if timers don't take into account sleep time.

Jason has stated that WireGuard requires timer patches for Linux, Darwin, and (prior to 1.13.3) Windows. Reverting this won't harm WG. I contacted him yesterday to point out this issue, and he ack'd, so I imagine he'll respond soon.

The right fix for WireGuard may be to offer a new kind of timer that uses absolute (wall clock) timeouts

I've been advocating for this on #24595 but so far, no traction...

@DmitriyMV
Copy link
Contributor

Quoting @mpx:

With BOOTTIME buggy use of timers may fail, with MONOTONIC correct use of timers may fail (Eg, #25248, #35012).

I don't think that trading correctness for the sake of backward compatibility is a right choice. I also don't think that adding new API to time package would be wise - most people don't care about the difference between MONOTONIC and BOOTTIME, and it will at best leave them confused, and at worst lead them to incorrect assumptions.

@networkimprov
Copy link
Author

@DmitriyMV, that probably belongs in the thread you quoted; it's off-topic here.

@jstarks
Copy link

jstarks commented Nov 15, 2019

@DmitriyMV , right now, with this change, we are inconsistent between Linux and Windows, and inconsistent between different Windows devices (connected standby vs. classic sleep). That inconsistency seems like the worst possible situation to be in.

@alexbrainman
Copy link
Member

I can find some time to make the change that Alex originally prototyped if no one else is available. But in the meantime, a patch release of Go has regressed programs running in containers, so should we consider reverting this and then apply a more appropriate change later?

I have no opinion on this matter. I will let Ian decide what to do here.

Alex

@zx2c4
Copy link
Contributor

zx2c4 commented Nov 21, 2019

Please close this issue and do not make any such revert.

The premises here are flawed.

  • Both Linux and macOS seem interested in moving to a BOOTTIME world. From discussion with the maintainers of the timer code in Linux, not doing this earlier is considered an unfortunate historical mishap. It'd be nice to change everything over in the kernel now all at once (which was tried some months ago), but compatibility issues make this a bit more tricky. So instead it's going to be a gradual shift in that direction.
  • MONOTONIC instead BOOTTIME makes implementing network protocols, such as WireGuard, impossible. Go must use BOOTTIME if we're going to have WireGuard or other similar network protocols.
  • From the original post, "The Windows kernel team changed timer behavior in Windows 8 to stop advancing relative timeouts on wake. Otherwise when you open your laptop lid, every timer in the system goes off all at once and you get a bunch of unpredictable errors." There are a lot of things wrong with this: firstly, I'm shocked that Microsoft broke compatibility -- they usually don't do that. Secondly, it's often the case that processes receive timer events "all at once" bunched up. On Unix, this happens all the time with SIGSTOP...SIGCONT, and Go code is generally fine. Heck, Go code better be fine. And, on highly loaded systems, scheduler latency often means this happens naturally.
  • From the original post, "This was a conscious design decision in Windows, and so it's disappointing to see the Go runtime second guess this several years later in a bug fix." Conscious or not, it's an annoying and bad decision, but not one that really matters to us. We're not "second guessing" - we're simply implementing the Go runtime as reasonably as we can given what the OS kernel provides.

@zx2c4
Copy link
Contributor

zx2c4 commented Nov 21, 2019

@DmitriyMV , right now, with this change, we are inconsistent between Linux and Windows, and inconsistent between different Windows devices (connected standby vs. classic sleep). That inconsistency seems like the worst possible situation to be in.

Doing things the right way on Linux and other platforms is a work in progress. Feel happy for the rare case in which the Windows implementation achieves the correct implementation (using BOOTTIME) first.

@jstarks
Copy link

jstarks commented Nov 21, 2019

This was a breaking change in a bug fix release that introduces inconsistent behavior between operating systems. There is clearly no consensus that this is the right change for Linux or the change would have been made already.

It's very strange to me that this is considered an acceptable approach to the evolution of the Go runtime.

@zx2c4
Copy link
Contributor

zx2c4 commented Nov 21, 2019

More false claims. A lot to unpack in three sentences. Here we go:

There is clearly no consensus that this is the right change for Linux or the change would have been made already.

"Would have been made already" is a ridiculous conclusion to jump to. They tried it, but it broke some userland in unexpected ways. The timer maintainers want to do it. It's just a matter of figuring out how.

This was a breaking change in a bug fix release

No, it fixed a regression with Windows timer buckets. Before the fix, Go timers were not reliable following a compatibility-breaking OS change from Microsoft.

It also keeps WireGuard viable on Windows. Not taking into account sleep time makes WireGuard and other network protocols impossible to implement. It's possible your branch of Microsoft isn't interested in WireGuard, but I'm told some NDIS people are playing with it.

inconsistent behavior between operating systems.

As mentioned, it's an ongoing work in progress to bring BOOTTIME support to other operating systems.

that introduces

Clearly "introduces" is the wrong word, since Windows had always been like this, until Windows 8, at which time there were actually two timers semantics being used at the same time, causing problems. The bug fix moved things back to only using one set of timer semantics, fixing the problem.

And guess which set of timer semantics it chose in order to fix the problem? The one that had always been used on Windows in Go since the beginning. It didn't introduce a new one, as that could have caused problems. Instead it went back to providing the same timer semantics that Go had originally.

@jstarks
Copy link

jstarks commented Nov 21, 2019

I'm going to double check my Windows 8 claim--it may have been Windows 7 (which would make sense because that's when QueryUnbiasedInterruptTime was introduced). I'll ask someone down the hall who has worked on the Windows timer infrastructure when I get a chance.

In any case, before the change to the Go runtime, Go programs inherited the system timer behavior. As far as I know, there was never a case before Go 1.13.3 that Go attempted to force a BOOTTIME-style timer behavior on Windows (or on Linux).

So yes, this change was a breaking change to Go timer semantics on Windows. I'm certainly sympathetic that WireGuard needs a solution here, but there is other Go software out there too.

@zx2c4
Copy link
Contributor

zx2c4 commented Nov 21, 2019

I'm going to double check my Windows 8 claim--it may have been Windows 7

Your Windows 8 claim is correct. MSDN confirms this, as does my own testing. From https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject : "Windows XP, Windows Server 2003, Windows Vista, Windows 7, Windows Server 2008 and Windows Server 2008 R2: The dwMilliseconds value does include time spent in low-power states. For example, the timeout does keep counting down while the computer is asleep. Windows 8, Windows Server 2012, Windows 8.1, Windows Server 2012 R2, Windows 10 and Windows Server 2016: The dwMilliseconds value does not include time spent in low-power states. For example, the timeout does not keep counting down while the computer is asleep."

As far as I know, there was never a case before Go 1.13.3 that Go attempted to force a BOOTTIME-style timer behavior on Windows (or on Linux).

Before 1.13.3, Go relied on BOOTTIME-style timer behavior working, even though it did not on certain newer Windows platforms, and so Go was broken on those platforms and we got bug reports. The Go behavior on Windows has always been to rely on this BOOTTIME-style of behavior, but Windows 8 put us in an inconsistent state. 1.13.3 fixed the inconsistency by reverting to the semantics Windows users of Go have always relied on. What you're suggesting here is introducing totally new semantics that we've never relied on. That sounds like a new proposal, and something I emphatically n'ack.

@networkimprov
Copy link
Author

I'm writing an app for multiple laptop platforms (Windows, MacOS, Linux) and want the same timer behavior across them. If the default behavior on Windows (or certain versions thereof) differs, I'd expect a switch to throw which aligns them.

That means a switch to let Windows 7 work like 8/10 and Unix, or vice versa. My current project could carry a runtime patch for this (I already need 2 stdlib patches on Windows).

I agree that Go should provide timers with boot-time semantics, but probably not as an upgrade to time.Timer/Ticker. How would apps that depend on timers reliably evaluate such an upgrade? What is the cost to adapt if the evaluation indicates trouble?

Go timers have been "broken" on Windows 8/10 for seven years and I've seen one bug report, filed in 2019; are there others? What did WireGuard do on Windows 8/10 before this patch? How does it handle timers on MacOS/Unix?

@dmitshur
Copy link
Contributor

@zx2c4 Could you please elaborate on why you wanted to remove the release-blocker label?

We were going over release-blocker issues in a meeting, and because no one knew why it was removed, we thought it was a gopherbot bug and re-added it. We learned in #35755 that you requested it to be removed, but it wasn't visible to us at the time.

@zx2c4
Copy link
Contributor

zx2c4 commented Nov 21, 2019

What did WireGuard do on Windows 8/10 before this patch?

Sleep was broken.

How does it handle timers on MacOS/Unix?

We patch Golang.

@zx2c4 Could you please elaborate on why you wanted to remove the release-blocker label?
We were going over release-blocker issues in a meeting, and because no one knew why it was removed, we thought it was a gopherbot bug and re-added it. We learned in #35755 that you requested it to be removed, but it wasn't visible to us at the time.

Oh, whoops, didn't think that'd be a big deal. The actual release-blocker bug is the docker issue somebody reported earlier -- #35447. Breaking docker seems very bad. This thread here, on the other hand, is some bikeshedding on if we should change Go's behavior from how it was originally designed way back when to something new and different. Not sure why this discussion would need to block a release.

@jstarks
Copy link

jstarks commented Nov 21, 2019

I think it's disrespectful to brush this issue off as bikeshedding. I don't mind if you clear the release blocking tag (I didn't add it), but discussing the technical merits of a bug fix that you implemented is anything but bikeshedding.

@networkimprov
Copy link
Author

So we have a likely near-term solution:
a) revert the patch in question,
b) provide a switch and/or fix to align Win7 with the other runtimes,
c) WireGuard can patch its runtime as it does for other platforms.

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 13, 2019

NEWS FLASH NEWS FLASH NEWS FLASH NEWS FLASH NEWS FLASH NEWS FLASH

Some new results just in, which will basically change this entire debate and allow us to entirely defer big invasive changes until 1.15.

Check out this discrepancy between Docker and non-Docker during S3 sleep: https://data.zx2c4.com/docker-uses-program-time-windows-dec-2019.mp4 This is running:

package main

import _ "unsafe"

//go:linkname nanotime runtime.nanotime
func nanotime() int64

func main() {
	start := nanotime()
	lastSecond := int64(0)
	for {
		now := nanotime()
		secondsSinceStart := (now - start) / 1000000000
		if secondsSinceStart > lastSecond {
			println(secondsSinceStart)
			lastSecond = secondsSinceStart
		}
	}
}

What you see in that screencast is that Docker uses "program time", whereas real Windows uses "real time".

THIS MEANS THAT THE ORIGINAL SIMPLE COMMIT FIXES THE DOCKER ISSUE

That behavior there makes the entire system consistent.

So at this point, I'd strongly recommend merging that commit, closing this issue, and starting a new discussion on "real time" vs "program time" and new APIs for Go 1.15.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/208317 mentions this issue: runtime: do not use PowerRegisterSuspendResumeNotification on systems with "program time" timer

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/211280 mentions this issue: [release-branch.go1.13] runtime: do not use PowerRegisterSuspendResumeNotification on systems with "program time" timer

@aclements
Copy link
Member

Check out this discrepancy between Docker and non-Docker during S3 sleep: https://data.zx2c4.com/docker-uses-program-time-windows-dec-2019.mp4

Oh goodness. Just making sure I understand completely, since runtime.nanotime is reading "interrupt time" in your example, "interrupt time" in Docker for Windows is actually "unbiased interrupt time" ("program time") and there's perhaps no monotonic clock that's actually on "real time" in Docker?

THIS MEANS THAT THE ORIGINAL SIMPLE COMMIT FIXES THE DOCKER ISSUE
https://go-review.googlesource.com/c/go/+/208317

Are we fairly certain this is the only cause of error 2?

This means running on bare Windows and running on Docker for Windows will behave differently, but 1) maybe there's no way around that, and 2) maybe it doesn't matter so much because people don't tend to run Docker on laptops anyway?

Thanks for working hard on the Docker issue. As you pointed out, this makes option 1 viable, where we stay on "real time" for both Sub and Sleep for Windows and try to come up with a more unified, consistent answer for 1.15. I'm okay with that because, if we do change the semantics for 1.15, we just have one big convergence of time behavior in 1.15, rather than changing Windows behavior in 1.14 and then again in 1.15.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/211307 mentions this issue: runtime: use monontonic time consistently on Windows

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 13, 2019

Check out this discrepancy between Docker and non-Docker during S3 sleep: https://data.zx2c4.com/docker-uses-program-time-windows-dec-2019.mp4

Oh goodness. Just making sure I understand completely, since runtime.nanotime is reading "interrupt time" in your example, "interrupt time" in Docker for Windows is actually "unbiased interrupt time" ("program time") and there's perhaps no monotonic clock that's actually on "real time" in Docker?

Yes, exactly.

THIS MEANS THAT THE ORIGINAL SIMPLE COMMIT FIXES THE DOCKER ISSUE
https://go-review.googlesource.com/c/go/+/208317

Are we fairly certain this is the only cause of error 2?

Pretty sure. I'm now chasing this around in the kernel in IDA. It looks to me that when it's unable to find the power management node for the powrprof functions, that same absence also will result in the hooks to the timer advancement code not being run.

This means running on bare Windows and running on Docker for Windows will behave differently, but 1) maybe there's no way around that, and 2) maybe it doesn't matter so much because people don't tend to run Docker on laptops anyway?

Thanks for working hard on the Docker issue. As you pointed out, this makes option 1 viable, where we stay on "real time" for both Sub and Sleep for Windows and try to come up with a more unified, consistent answer for 1.15. I'm okay with that because, if we do change the semantics for 1.15, we just have one big convergence of time behavior in 1.15, rather than changing Windows behavior in 1.14 and then again in 1.15.

Right. Glad we're on the same page here. I too would like to see everything unified across platforms, and 1.15 seems like the right time to do that.

@networkimprov
Copy link
Author

THE ORIGINAL SIMPLE COMMIT FIXES THE DOCKER ISSUE

Excellent, now you have a safe patch for the Windows runtime in the Wireguard build, to go with your runtime patches for MacOS #35012 (comment) & Linux #24595 (comment) \o/

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 13, 2019

THE ORIGINAL SIMPLE COMMIT FIXES THE DOCKER ISSUE

Excellent, now you have a safe patch for the Windows runtime in the Wireguard build, to go with your runtime patches for MacOS #35012 (comment) & Linux #24595 (comment) \o/

No, WireGuard for Windows isn't going to be "patching the runtime", and Go shouldn't be introducing regressions either without careful consideration. The Docker bug was a significant regression; that's fixed now. No need to introduce yet-another-one.

However, I'm up for revisiting all behavior for 1.15, where we'll have plenty of time to discuss this and prepare, code-wise, for whatever the implications.

@ianlancetaylor
Copy link
Contributor

One thing I'm uncertain of from reading that is: it sounds like you now have two mechanisms, one based on GetQueuedCompletionStatusEx and another based on WaitForMultipleObjects. Will notewakeup(&sched.sysmonnote) dislodge both of them, regardless of which one is being used at the present?

No. It will only wake up the sysmon thread sleeping in WaitForMultipleObjects. But that's OK, because that thread will then check the time and start an M to run all timers that are ready. In the 1.13 timer code, timers could only be run by the goroutine that handled the bucket to which they were assigned, so it was necessary to wake up that goroutine in order to do anything. In the 1.14 timer code, timers can be run by any M.

If it is possible to notewakeup(&sched.sysmonnote) to dislodge either, then it sounds like all that's necessary is to do such a notewakeup(&sched.sysmonnote) from the suspend/resume notifier. Is that all correct?

That last part is correct: the suspend/resume notifier should only need to call notewakeup(&sched.sysmonnote), with an appropriate check of sched.sysmonwait.

Does this, then, mean that keeping nanotime() on "real time" and running notewakeup(&sched.sysmonnote) on suspend/resume is enough to have 1.14's time.Timer on "real time"?

Yes.

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 13, 2019

No. It will only wake up the sysmon thread sleeping in WaitForMultipleObjects. But that's OK, because that thread will then check the time and start an M to run all timers that are ready. In the 1.13 timer code, timers could only be run by the goroutine that handled the bucket to which they were assigned, so it was necessary to wake up that goroutine in order to do anything. In the 1.14 timer code, timers can be run by any M.

Oh interesting. That's a nice design improvement.

If it is possible to notewakeup(&sched.sysmonnote) to dislodge either, then it sounds like all that's necessary is to do such a notewakeup(&sched.sysmonnote) from the suspend/resume notifier. Is that all correct?

That last part is correct: the suspend/resume notifier should only need to call notewakeup(&sched.sysmonnote), with an appropriate check of sched.sysmonwait.

Does this, then, mean that keeping nanotime() on "real time" and running notewakeup(&sched.sysmonnote) on suspend/resume is enough to have 1.14's time.Timer on "real time"?

Yes.

Great. After the Docker-fix CL is committed, I'll send a simplification for 1.14 that runs notewakeup(&sched.sysmonnote) instead of the rather complex thing that's currently there.

@networkimprov
Copy link
Author

if we do change the semantics for 1.15, we just have one big convergence of time behavior in 1.15, rather than changing Windows behavior in 1.14 and then again in 1.15.

@aclements what was the possible "change again" in 1.15? Addition of NewTimerAt() and friends?

A key rationale for changing to program/monotonic in 1.14 is that it's the native model for Win8/10, which has been in use for 7 years. @jstarks noted that it's odd for Go to second-guess that. At this point, most Windows laptops in use are running Win8/10.

@networkimprov
Copy link
Author

@aclements also pointed out that the patch in question ignores some sleep states, see #35482 (comment)

Is that acceptable? Is it fixable?

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 13, 2019

@aclements also pointed out that the patch in question ignores some sleep states, see #35482 (comment)

I wasn't able to reproduce that claim, actually. That other patch I sent uses a different notification mechanism, though, and I think if we encounter bug reports from users we'll be able to swap out mechanisms. In my testing though, the existing one was fine.

@aclements
Copy link
Member

@aclements what was the possible "change again" in 1.15? Addition of NewTimerAt() and friends?

That's right. Or, at least some sort of OS convergence that we've had more time to think about, whether that's NewTimerAt or something else.

A key rationale for changing to program/monotonic in 1.14 is that it's the native model for Win8/10, which has been in use for 7 years. @jstarks noted that it's odd for Go to second-guess that. At this point, most Windows laptops in use are running Win8/10.

I'm not that concerned with the "native" model of any particular OS, especially when OSes can't agree on what that model should be (e.g., Windows moving to monotonic time, Linux trying [though failing] to move away from monotonic time). I think monotonic time is definitely part of the right answer, but it's also not the whole answer, which is why I'm okay with putting this on hold to minimize design thrashing, and making headway on a whole answer for 1.15.

@networkimprov
Copy link
Author

networkimprov commented Dec 13, 2019

OK. One more use case to consider: apps which use TCP to reach other apps on the same laptop, e.g. a "localhost web app" (which is what I'm building). When you suspend such a system, you don't want either side to time-out and drop a connection on resume.

I occasionally see failures like this in my app on a Win7 laptop. That hasn't appeared in the modest amount of testing I've done on Win8/10.

EDIT: presumably because the browser may get a timeout on Win7, but not Win8/10.

@networkimprov
Copy link
Author

If the stdlib exported a function that returns "interrupt time" (current source of runtime.nanotime) could that be used to implement @bradfitz suggestion in #35482 (comment) ?

@alexbrainman
Copy link
Member

Although I don't understand the reference to CL 198417.

Issue #31528 was fixed by CL 191957. But after CL 191957 was submitted, Austin suggested an improvement to it. And Jason implemented the improvement in CL 198417.

I could have selected CL 191957 to test the issue. Any commit after CL 191957 is good. I chose CL 198417.

Just to confirm: did you run those tests on current tip?

I did not test current tip. I think issue #31528 is still fixed on current tip. But issue is broken again on CL 210437.

I see three options here:

@aclements I don't have time to invest into this. Whatever you decide, I will be happy.

What you see in that screencast is that Docker uses "program time", whereas real Windows uses "real time".

@zx2c4 thank you for confirming that Docker uses "program time".

THIS MEANS THAT THE ORIGINAL SIMPLE COMMIT FIXES THE DOCKER ISSUE

Yes. CL 208317 will allow Docker to run. But Docker time behavior is still different from real Windows. And we need to bring them in line somehow.

So at this point, I'd strongly recommend merging that commit, closing this issue, and starting a new discussion on "real time" vs "program time" and new APIs for Go 1.15.

Sounds reasonable to me.

Alex

@networkimprov
Copy link
Author

networkimprov commented Dec 16, 2019

Since timing unification is likely in 1.15, could we add a runtime env var in 1.14 to change Windows to program/monotonic timing? That would let Windows devs & users preview the change.

That same env var could be used to switch Windows back to real/boot timing in 1.15. We don't need to support two modes indefinitely, but two releases with both options would be helpful.

@aclements
Copy link
Member

I don't think it's clear that program/monotonic time for everything is the obvious path forward for 1.15, so that wouldn't be an effective way to "preview" the change.

@networkimprov
Copy link
Author

Well runtime.nanotime either uses "interrupt time" or "unbiased interrupt time"; is there another option?

I'm only suggesting that Unix-style timing/sleep be accessible in 1.14.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/213198 mentions this issue: [release-branch.go1.12] runtime: do not use PowerRegisterSuspendResumeNotification on systems with "program time" timer

gopherbot pushed a commit that referenced this issue Jan 3, 2020
…eNotification on systems with "program time" timer

Systems where PowerRegisterSuspendResumeNotification returns ERROR_
FILE_NOT_FOUND are also systems where nanotime() is on "program time"
rather than "real time".  The chain for this is:

powrprof.dll!PowerRegisterSuspendResumeNotification ->
  umpdc.dll!PdcPortOpen ->
    ntdll.dll!ZwAlpcConnectPort("\\PdcPort") ->
      syscall -> ntoskrnl.exe!AlpcpConnectPort

Opening \\.\PdcPort fails with STATUS_OBJECT_NAME_NOT_FOUND when pdc.sys
hasn't been initialized. Pdc.sys also provides the various hooks for
sleep resumption events, which means if it's not loaded, then our "real
time" timer is actually on "program time". Finally STATUS_OBJECT_NAME_
NOT_FOUND is passed through RtlNtStatusToDosError, which returns ERROR_
FILE_NOT_FOUND. Therefore, in the case where the function returns ERROR_
FILE_NOT_FOUND, we don't mind, since the timer we're using will
correspond fine with the lack of sleep resumption notifications. This
applies, for example, to Docker users.

Updates #35447
Updates #35482
Fixes #35746

Change-Id: I9e1ce5bbc54b9da55ff7a3918b5da28112647eee
Reviewed-on: https://go-review.googlesource.com/c/go/+/211280
Run-TryBot: Jason A. Donenfeld <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Austin Clements <[email protected]>
Reviewed-by: Jason A. Donenfeld <[email protected]>
gopherbot pushed a commit that referenced this issue Jan 3, 2020
…eNotification on systems with "program time" timer

Systems where PowerRegisterSuspendResumeNotification returns ERROR_
FILE_NOT_FOUND are also systems where nanotime() is on "program time"
rather than "real time".  The chain for this is:

powrprof.dll!PowerRegisterSuspendResumeNotification ->
  umpdc.dll!PdcPortOpen ->
    ntdll.dll!ZwAlpcConnectPort("\\PdcPort") ->
      syscall -> ntoskrnl.exe!AlpcpConnectPort

Opening \\.\PdcPort fails with STATUS_OBJECT_NAME_NOT_FOUND when pdc.sys
hasn't been initialized. Pdc.sys also provides the various hooks for
sleep resumption events, which means if it's not loaded, then our "real
time" timer is actually on "program time". Finally STATUS_OBJECT_NAME_
NOT_FOUND is passed through RtlNtStatusToDosError, which returns ERROR_
FILE_NOT_FOUND. Therefore, in the case where the function returns ERROR_
FILE_NOT_FOUND, we don't mind, since the timer we're using will
correspond fine with the lack of sleep resumption notifications. This
applies, for example, to Docker users.

Updates #35447
Updates #35482
Fixes #36377

Change-Id: I9e1ce5bbc54b9da55ff7a3918b5da28112647eee
Reviewed-on: https://go-review.googlesource.com/c/go/+/208317
Reviewed-by: Jason A. Donenfeld <[email protected]>
Reviewed-by: Austin Clements <[email protected]>
Run-TryBot: Jason A. Donenfeld <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-on: https://go-review.googlesource.com/c/go/+/213198
@networkimprov
Copy link
Author

There's another report of PowerRegisterSuspendResumeNotification failure in docker, #36557

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. OS-Windows release-blocker
Projects
None yet
Development

No branches or pull requests