Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate dataloader to use Fiber#transfer for modern scheduler compatibility #4003

Closed
dbackeus opened this issue Mar 24, 2022 · 10 comments
Closed

Comments

@dbackeus
Copy link

dbackeus commented Mar 24, 2022

Is your feature request related to a problem? Please describe.

The current implementation of Dataloader uses yield / resume for Fiber management which makes it incompatible with modern Ruby 3.1 based fiber schedulers which rely on transfer instead (see https://github.com/bruno-/fiber_scheduler_list for a reference).

AFAICT there is no production ready Ruby 3.1 based scheduler that can be used today with the GraphQL gem.

The documentation for non-blocking dataloading at https://graphql-ruby.org/dataloader/nonblocking.html mentions "not having been able to figure out how to make Async::Scheduler work". The reason is the same - Async::Scheduler does not work with yield/ resume.

Describe the solution you'd like

I've had some Slack interaction with Bruno Sutic, author of the above scheduler list as well as the fiber_scheduler gem.

According to him migrating toward using transfer would be the right way to go for the GraphQL gem as he expects that to be the default paradigm for Fiber schedulers going forward.

I'm not sure how relevant it is to keep backwards compatibility with ruby 3.0 based schedulers. If needed then I guess we would need two different fiber management implementations and offer developers a configuration option to switch between them.

Describe alternatives you've considered

I spent a few hours trying to implement the change myself. I got the async_dataloader_spec.rb tests passing with both Async::Scheduler and FiberScheduler but a lot of deeper integration tests ended up failing and it appears too time consuming for me to make sense of all the parts to make progress with the time I have on hand.

@rmosolgo
Copy link
Owner

Hi, thanks for sharing your research on this!

Bummer that there are two different control passing approaches. For now, I'd like to try to support both, so that anyone who's already using Fiber-based concurrency can continue using it the way they already are.

Do you mind sharing your work on this so far? Even if it's not all passing, I'd like to see the approach you took.

@dbackeus
Copy link
Author

dbackeus commented Mar 24, 2022

Sure thing: #4004

Not much of an approach I'm afraid, just tried the simplest thing that could possibly work, which turned out too simple 😅

@rmosolgo
Copy link
Owner

Cool, thanks for sharing. Mostly, I'm happy to confirm my understanding of the kind of refactor required: worker fibers will need references to their parent fibers, so that they can explicitly transfer control to them (rather than implicitly yielding, which was automatically going back to the parent fiber).

@bruno-
Copy link
Contributor

bruno- commented Mar 24, 2022

Hi,

what you guys are doing is bleeding edge work. I'm pretty sure no serious app/gem is using raw fiber scheduler interface - yet...

About the yield/resume vs #transfer approach: ideally the gem wouldn't use any of these methods. You see, if you start manually calling Fiber#transfer then your code is scheduling the execution of fibers... when instead "fiber scheduler" should be responsible for, well - "scheduling".

Example - you have two fibers: fiber1 and fiber2. If fiber1 is running and it does fiber2.transfer, then your code is responsible for resuming fiber1 some time later. No "fiber scheduler" will resume fiber1 after you do this.

What to use instead:

  • sleep 0 - will transfer control to fiber scheduler (which will run other "ready" fibers), but fiber scheduler will then automatically resume the current fiber when it gets the chance.
  • Fiber.scheduler.block(nil, 0) - similar to sleep 0. I'd give a slight preference to sleep 0 tho.

@bruno-
Copy link
Contributor

bruno- commented Mar 24, 2022

All of this is hard work and a lot of research.

@rmosolgo
Copy link
Owner

Thanks for chiming in, @bruno-!

your code is scheduling the execution of fibers

The thing is, I think I want to do that -- I want some code to run (GraphQL execution), then I want to pause it at an application-defined point -- Fiber.yield -- then I want to run some other code (batch loaders), then I want to transfer back to the code that previously yielded (GraphQL execution).

They really are order-dependent: graphql execution accumulates paramaters for data loading, then data loading fetches data and hands off back to graphql execution.

That said, maybe we could interact with the scheduler differently, like you described above.

@bruno-
Copy link
Contributor

bruno- commented Apr 7, 2022

I chatted with @dbackeus about the problem you're describing. I understand it as the following:

  • Run a main fiber (GraphQL execution).
  • Spawn multiple fibers that do some network requests (batch loaders).
  • Wait in the main fiber until all "batch loader" fibers are done.

Does that sound like a problem description to you?

Although the above is simple, I don't think raw ruby fibers have a read-made solution for this. I worked a quick solution using Ruby's Queue:

require "open-uri"
require "fiber_scheduler"

queue = Queue.new

Fiber.set_scheduler FiberScheduler.new

5.times do |i|
  Fiber.schedule do
    delay = (rand 20).fdiv(10)
    URI.open("https://httpbin.org/delay/#{delay}")
    puts "Done #{i} with delay #{delay}"
    queue << "."
  end
end

Fiber.schedule do
  5.times do
    queue.pop
  end

  puts "All 5 requests are done"
end

The benefit of this approach is: no direct fiber scheduling (doing Fiber#transfer). The downside is we have to use a Queue which I think uses mutexes internally, which are somewhat "heavy" I think.

@dbackeus
Copy link
Author

dbackeus commented Apr 7, 2022

I think for this gem in particular it's a bit more complicated than just a single batch of jobs that need to be completed. It's more like an hierarchical graph of jobs where ordering of execution is important, so some jobs need to complete before others. But @rmosolgo would be able to explain it better than me 😅

@ioquatix
Copy link

ioquatix commented Oct 6, 2022

Queue does not use a mutex in CRuby.

@rmosolgo
Copy link
Owner

I tried supporting .transfer generally in #4625 and it kinda worked, but it was never quite there for using async's fiber scheduler stand-alone.

So instead, I did a separate dataloader class that uses async's concurrency primitives directly, and it worked great: #4727 . It'll be released in GraphQL-Ruby 2.2.0 shortly 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants