Fixing interruption behaviour #3183

Angel-O · 2023-03-25T11:18:07Z

Summary

Fixes groupWithin: inconsistent behaviour on source termination #3169
Adding missing tests around error/interruption propagation
Adding stress/integrity test

Changes

increasing supply by Int.MaxValue + chunkSize on upstream finalization to prevent the supply semaphore from blocking when upstream is interrupted while downstream is waiting for the timeout to expire or to have enough elements

Notes

the stress test: all elements are processed test is nothing but a copy of the benchmark with and integrity check at the end. While it may seem redundant (i.e. should never lose any elements should cover it), in reality it is possible to write an implementation that passes that test but fails this one. (~~on a side note, that explains the big performance gain of the implementation linked earlier, so~~ having this test in place could prevent being misled when looking at benchmark results, while working on a new implementation 🙏🏾 )

* adding error & interruption propagation and integrity test

armanbilge · 2023-03-31T16:23:14Z

How does this PR relate to #3186? Should this be reviewed first?

Angel-O · 2023-03-31T20:41:37Z

I ended up making all the changes in #3186, since I wanted to verify the correctness of that implementation, but I'm happy for this one to be considered first. I'll sync up the branch, run sbt prePR and push again if needed.

armanbilge · 2023-04-02T11:45:54Z

core/shared/src/test/scala/fs2/StreamCombinatorsSuite.scala

+
+        val downstream = source.groupWithin(100, 2.seconds)
+
+        downstream.intercept[SevenNotAllowed.type]


Should we make an assertion here about what the downstream has / has not received before the error?

armanbilge · 2023-04-02T11:51:17Z

core/shared/src/test/scala/fs2/StreamCombinatorsSuite.scala

+                .timeout(downstreamTimeout)
+                .flatTap(_ => IO.monotonic.flatMap(ref.set))
+                .flatMap(emit => ref.get.map(timeLapsed => (timeLapsed, emit)))


Suggested change

.timeout(downstreamTimeout)

.flatTap(_ => IO.monotonic.flatMap(ref.set))

.flatMap(emit => ref.get.map(timeLapsed => (timeLapsed, emit)))

.timed

Co-authored-by: Arman Bilge <[email protected]>

armanbilge · 2023-04-02T16:01:21Z

core/shared/src/test/scala/fs2/StreamCombinatorsSuite.scala

@@ -874,6 +874,7 @@ class StreamCombinatorsSuite extends Fs2Suite {
            source.groupWithin(Int.MaxValue, 1.day)

          downstream.compile.lastOrError
+            .timeout(downstreamTimeout)


Why is this timeout necessary? Since its an executeEmbed test.

it's because if the test fails we get a slightly better error message

java.util.concurrent.TimeoutException: 7500 milliseconds which can be easily associated to downstreamTimeout

otherwise we get this value on the diff which looks a bit random

_1 = 86405500000000 nanoseconds,

but I'm happy to remove it

Got it, that is a nicer error :) thanks!

armanbilge · 2023-04-02T16:07:04Z

core/shared/src/main/scala/fs2/Stream.scala

        def endSupply(result: Either[Throwable, Unit]): F2[Unit] =
-          buffer.update(_.copy(endOfSupply = Some(result))) *> supply.releaseN(Int.MaxValue)
+          buffer.update(_.copy(endOfSupply = Some(result))) *> supply.releaseN(
+            Int.MaxValue + outputLong
+          )


Sorry, dumb question: why is Int.MaxValue a "magic number" in this context? I would have thought it's effectively maxing out the semaphore, but if it needs + outputLong to work then I feel like it must have more significance?

Legit question to be fair. Had to think about it again.

Interruption of the upstream fiber (i.e. Outcome.Cancelled) is handled downstream by doing nothing (permits are never released)
So by increasing the supply to Int.MaxValue we are just evening out the negative balance (Int.MaxValue is to account for the worst case scenario: at most the chunkSize parameter will be equal to Int.MaxValue)

val waitSupply = supply.acquireN(outputLong).guaranteeCase { case Outcome.Succeeded(_) => supply.releaseN(outputLong) case _ => F.unit }

Now after getting past the "checkpoint" above we are acquiring outputLong permits again

acq <- F.race(F.sleep(timeout), waitSupply).flatMap { case Left(_) => onTimeout case Right(_) => supply.acquireN(outputLong).as(outputLong) }

So in order to get past this point we need to release an additional outputLong permits and that allows the stream to be unblocked

EDIT

Interruption of the upstream fiber (i.e. Outcome.Cancelled)

uhm well actually I've just tested it, it is not handled with Outcome.Cancelled...

Thanks for that explanation!

Int.MaxValue is to account for the worst case scenario: at most the chunkSize parameter will be equal to Int.MaxValue

So could we just use chunkSize here, instead of Int.MaxValue ?

@armanbilge apologies I was wrong, that's not what's happening here. I'm just doing some tests to figure out why we need the additional outputLong

Btw, if these implementations details are no longer relevant after your rewrite in the other PR, then let's not get too hung up on this one :)

ok I think I've figured it out (might be useful for the other implementation actually)

basically the problem is that we need enough supply to cover 2 iterations of the race loop. So if we only increase it by Int.MaxValue the following will happen

(current iteration): supply is unblocked

(next iteration): supply gets stuck (not enough supply because upstream was interrupted)

if instead we increase it by Int.MaxValue + outputLong

(current iteration): supply is unblocked

(next iteration): supply is not blocked thanks to the additional outputLong

So since the chunkSize can be as high as Int.MaxValue then the minimum supply to unblock the semaphore should be Int.MaxValue + outputLong

So since the chunkSize can be as high as Int.MaxValue then the minimum supply to unblock the semaphore should be Int.MaxValue + outputLong

Key word being "can". Wouldn't chunkSize + outputLong be sufficient?

yeah that should work. The test still passes, I'll change it to outputLong * 2 since chunkSize == outputLong

He-Pin · 2023-04-02T18:45:36Z

Have you stress the latency of many fs streams with groupWithin？

Angel-O · 2023-04-02T18:50:19Z

Have you stress the latency of many fs streams with groupWithin？

Hey @He-Pin , may I ask ? what do you mean exactly? Running many streams concurrently and using groupWithin on the resulting stream ? i.e. after calling .parJoinUnbounded ?

He-Pin · 2023-04-02T18:55:48Z

Have you stress the latency of many fs streams with groupWithin？

Hey @He-Pin , may I ask ? what do you mean exactly? Running many streams concurrently and using groupWithin on the resulting stream ? i.e. after calling .parJoinUnbounded ?

Eg, using groupwithin to build a lock-step game server where the latency matters

Angel-O · 2023-04-02T19:10:56Z

Eg, using groupwithin to build a lock-step game server where the latency matters

Thanks for the clarification. I must admit that unfortunately I've never heard of a lock-step server, so I haven't done any work (or rather wrote any test) with this particular use case in mind. Also keep in mind that I'm fairly new to open source contributions so maybe this is more of a question for regular maintainers.

Nonetheless, I'd be interested to find out more: can you point me towards an example or a beginner friendly resource? Thank you 🙏🏾

armanbilge · 2023-04-02T19:13:03Z

@He-Pin you are welcome to contribute some benchmarks for your usecase, that will probably be the best way to answer your question :)

He-Pin · 2023-04-02T19:15:39Z

Eg, using groupwithin to build a lock-step game server where the latency matters

Thanks for the clarification. I must admit that unfortunately I've never heard of a lock-step server, so I haven't done any work (or rather wrote any test) with this particular use case in mind. Also keep in mind that I'm fairly new to open source contributions so maybe this is more of a question for regular maintainers.

Nonetheless, I'd be interested to find out more: can you point me towards an example or a beginner friendly resource? Thank you 🙏🏾

keep performance in mind, this is the way,mandalorian.

He-Pin · 2023-04-02T19:18:28Z

@He-Pin you are welcome to contribute some benchmarks for your usecase, that will probably be the best way to answer your question :)

that's true, for my usecase i need to use a Hashed timer instead.

I'm learning more CE code too, will pr later.

armanbilge

Thanks for chasing this one down! The new tests specifying the expected behavior are great 👍

Angel-O added 2 commits March 25, 2023 10:50

* fixing interruption behaviour

b720a5b

* adding error & interruption propagation and integrity test

fmt

b03e1f8

Angel-O mentioned this pull request Mar 25, 2023

groupwithin new implementation #3162

Closed

reducing rangeLength by a factor of 10 to avoid timeout on CI

4955b51

Angel-O mentioned this pull request Mar 25, 2023

Groupwithin improvements #3186

Open

Angel-O added 4 commits March 31, 2023 21:47

Merge branch 'typelevel:main' into fixing-interruption-behaviour

b785e49

simplifying stream, using testcontrol

8182c8e

minor

fdc2d16

reducing rangeLength by a factor of 10 to prevent timeout on js

d0f0a3a

armanbilge reviewed Apr 2, 2023

View reviewed changes

Angel-O and others added 3 commits April 2, 2023 16:52

Update core/shared/src/test/scala/fs2/StreamCombinatorsSuite.scala

ff688be

Co-authored-by: Arman Bilge <[email protected]>

removing unused ref

c73ab66

downstreamtimeout

b64eb5a

armanbilge reviewed Apr 2, 2023

View reviewed changes

adding assertion

8d57590

armanbilge reviewed Apr 2, 2023

View reviewed changes

supply increase made clear

9ae18d7

armanbilge approved these changes Apr 4, 2023

View reviewed changes

Merge branch 'typelevel:main' into fixing-interruption-behaviour

89889fa

armanbilge merged commit bfbb489 into typelevel:main May 12, 2023

biochimia mentioned this pull request Aug 30, 2023

Stuck partitions - question about the pause/resume logic fd4s/fs2-kafka#1230

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing interruption behaviour #3183

Fixing interruption behaviour #3183

Angel-O commented Mar 25, 2023 •

edited

Loading

armanbilge commented Mar 31, 2023

Angel-O commented Mar 31, 2023

armanbilge Apr 2, 2023

armanbilge Apr 2, 2023

armanbilge Apr 2, 2023

Angel-O Apr 2, 2023

armanbilge Apr 2, 2023

armanbilge Apr 2, 2023

Angel-O Apr 2, 2023 •

edited

Loading

armanbilge Apr 2, 2023

Angel-O Apr 2, 2023

armanbilge Apr 2, 2023

Angel-O Apr 2, 2023 •

edited

Loading

armanbilge Apr 2, 2023

Angel-O Apr 2, 2023

He-Pin commented Apr 2, 2023

Angel-O commented Apr 2, 2023

He-Pin commented Apr 2, 2023 •

edited

Loading

Angel-O commented Apr 2, 2023

armanbilge commented Apr 2, 2023

He-Pin commented Apr 2, 2023

He-Pin commented Apr 2, 2023

armanbilge left a comment


		val downstream = source.groupWithin(100, 2.seconds)

		downstream.intercept[SevenNotAllowed.type]

Fixing interruption behaviour #3183

Fixing interruption behaviour #3183

Conversation

Angel-O commented Mar 25, 2023 • edited Loading

Summary

Changes

Notes

armanbilge commented Mar 31, 2023

Angel-O commented Mar 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Angel-O Apr 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Angel-O Apr 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

He-Pin commented Apr 2, 2023

Angel-O commented Apr 2, 2023

He-Pin commented Apr 2, 2023 • edited Loading

Angel-O commented Apr 2, 2023

armanbilge commented Apr 2, 2023

He-Pin commented Apr 2, 2023

He-Pin commented Apr 2, 2023

armanbilge left a comment

Choose a reason for hiding this comment

Angel-O commented Mar 25, 2023 •

edited

Loading

Angel-O Apr 2, 2023 •

edited

Loading

Angel-O Apr 2, 2023 •

edited

Loading

He-Pin commented Apr 2, 2023 •

edited

Loading