-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate dataloader to use Fiber#transfer for modern scheduler compatibility #4003
Comments
Hi, thanks for sharing your research on this! Bummer that there are two different control passing approaches. For now, I'd like to try to support both, so that anyone who's already using Fiber-based concurrency can continue using it the way they already are. Do you mind sharing your work on this so far? Even if it's not all passing, I'd like to see the approach you took. |
Sure thing: #4004 Not much of an approach I'm afraid, just tried the simplest thing that could possibly work, which turned out too simple 😅 |
Cool, thanks for sharing. Mostly, I'm happy to confirm my understanding of the kind of refactor required: worker fibers will need references to their parent fibers, so that they can explicitly transfer control to them (rather than implicitly yielding, which was automatically going back to the parent fiber). |
Hi, what you guys are doing is bleeding edge work. I'm pretty sure no serious app/gem is using raw fiber scheduler interface - yet... About the Example - you have two fibers: fiber1 and fiber2. If What to use instead:
|
All of this is hard work and a lot of research. |
Thanks for chiming in, @bruno-!
The thing is, I think I want to do that -- I want some code to run (GraphQL execution), then I want to pause it at an application-defined point -- They really are order-dependent: graphql execution accumulates paramaters for data loading, then data loading fetches data and hands off back to graphql execution. That said, maybe we could interact with the scheduler differently, like you described above. |
I chatted with @dbackeus about the problem you're describing. I understand it as the following:
Does that sound like a problem description to you? Although the above is simple, I don't think raw ruby fibers have a read-made solution for this. I worked a quick solution using Ruby's Queue: require "open-uri"
require "fiber_scheduler"
queue = Queue.new
Fiber.set_scheduler FiberScheduler.new
5.times do |i|
Fiber.schedule do
delay = (rand 20).fdiv(10)
URI.open("https://httpbin.org/delay/#{delay}")
puts "Done #{i} with delay #{delay}"
queue << "."
end
end
Fiber.schedule do
5.times do
queue.pop
end
puts "All 5 requests are done"
end The benefit of this approach is: no direct fiber scheduling (doing |
I think for this gem in particular it's a bit more complicated than just a single batch of jobs that need to be completed. It's more like an hierarchical graph of jobs where ordering of execution is important, so some jobs need to complete before others. But @rmosolgo would be able to explain it better than me 😅 |
Queue does not use a mutex in CRuby. |
I tried supporting So instead, I did a separate dataloader class that uses |
Is your feature request related to a problem? Please describe.
The current implementation of Dataloader uses
yield
/resume
for Fiber management which makes it incompatible with modern Ruby 3.1 based fiber schedulers which rely ontransfer
instead (see https://github.com/bruno-/fiber_scheduler_list for a reference).AFAICT there is no production ready Ruby 3.1 based scheduler that can be used today with the GraphQL gem.
The documentation for non-blocking dataloading at https://graphql-ruby.org/dataloader/nonblocking.html mentions "not having been able to figure out how to make Async::Scheduler work". The reason is the same -
Async::Scheduler
does not work withyield
/resume
.Describe the solution you'd like
I've had some Slack interaction with Bruno Sutic, author of the above scheduler list as well as the fiber_scheduler gem.
According to him migrating toward using
transfer
would be the right way to go for the GraphQL gem as he expects that to be the default paradigm for Fiber schedulers going forward.I'm not sure how relevant it is to keep backwards compatibility with ruby 3.0 based schedulers. If needed then I guess we would need two different fiber management implementations and offer developers a configuration option to switch between them.
Describe alternatives you've considered
I spent a few hours trying to implement the change myself. I got the async_dataloader_spec.rb tests passing with
bothAsync::Scheduler
andFiberScheduler
but a lot of deeper integration tests ended up failing and it appears too time consuming for me to make sense of all the parts to make progress with the time I have on hand.The text was updated successfully, but these errors were encountered: