Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a recursive descent parser #4718

Merged
merged 20 commits into from
Dec 11, 2023
Merged

Use a recursive descent parser #4718

merged 20 commits into from
Dec 11, 2023

Conversation

rmosolgo
Copy link
Owner

@rmosolgo rmosolgo commented Nov 30, 2023

I'm interested in incorporating optimizations from @tenderlove's work in:

It will be a bit of work because I want to maintain GraphQL-Ruby's comment tracking (required in order to losslessly parse and dump schemas, for example) and retain (didn't happen -- just using strings now) compatibility for AST nodes. If that all works, then I will replace GraphQL::Language::Parser with GraphQL::Language::RecursiveDescentParser before I merge this branch.

Also, I think this will provide an opportunity to improve parse error messages.

  • Port and integrate TinyGQL's scanner
  • Port and integrate TinyGQL's parser
  • Test new error messages
  • Revisit AST node initialization & state

Note

  • This parser doesn't parse comment descriptions, only string descriptions. I plan to release with this breaking change because comment descriptions aren't in the spec anymore.

Fixes #4646

@rmosolgo
Copy link
Owner Author

So far, it's not as fast as the C parser (looking at the introspection query), but it does make fewer allocations (presumably due to the new lexer/token design).

$ RUBY_YJIT_ENABLE=1 ruby parser_bench.rb
ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [x86_64-darwin22]
Warming up --------------------------------------
           Old Parse   164.000  i/100ms
           New Parse   897.000  i/100ms
             C Parse     1.124k i/100ms
Calculating -------------------------------------
           Old Parse      1.642k (± 2.4%) i/s -      8.364k in   5.096597s
           New Parse      9.352k (± 2.8%) i/s -     47.541k in   5.087829s
             C Parse     11.540k (± 1.8%) i/s -     58.448k in   5.066357s

Comparison:
             C Parse:    11540.4 i/s
           New Parse:     9351.6 i/s - 1.23x  slower
           Old Parse:     1642.1 i/s - 7.03x  slower

Allocations: Old Parse: 1149
Allocations: New Parse: 258
Allocations: C Parse: 443
Parser Benchmark

require "bundler/inline"

gemfile do
  gem "graphql", path: "./"
  gem "graphql-c_parser"
  gem "benchmark-ips"
end

puts RUBY_DESCRIPTION

query_str = GraphQL::Introspection::INTROSPECTION_QUERY

Benchmark.ips do |x|
  x.report("Old Parse") { GraphQL::Language::Parser.parse(query_str) }
  x.report("New Parse") { GraphQL::Language::RecursiveDescentParser.parse(query_str) }
  x.report("C Parse") { GraphQL::CParser.parse(query_str) }
  x.compare!
end


def check_allocations(label)
  prev = GC.stat(:total_allocated_objects)
  yield
  allocations = GC.stat(:total_allocated_objects) - prev
  puts "Allocations: #{label}: #{allocations}"
end

check_allocations("Old Parse") { GraphQL::Language::Parser.parse(query_str) }
check_allocations("New Parse") { GraphQL::Language::RecursiveDescentParser.parse(query_str) }
check_allocations("C Parse") { GraphQL::CParser.parse(query_str) }

@rmosolgo rmosolgo added this to the 2.2.0 milestone Dec 8, 2023
@rmosolgo rmosolgo changed the base branch from master to 2.2-dev December 8, 2023 16:07
@rmosolgo rmosolgo changed the base branch from 2.2-dev to master December 8, 2023 16:08
@rmosolgo rmosolgo changed the base branch from master to 2.2-dev December 8, 2023 16:08
@rmosolgo rmosolgo force-pushed the recursive-descent-parser branch from 1175e68 to 4f6e18b Compare December 8, 2023 16:11
@tenderlove
Copy link
Contributor

ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [x86_64-darwin22]

Is this on Rosetta? If so, I'd try on ARM or an Intel. Also, I think I only tested with YJIT on Ruby 3.3 and we've made very good improvements there, so it might be worth testing.

@tenderlove
Copy link
Contributor

Also, one thing I'd suggest (though it might be outside the scope of this PR) is to refactor the constructors for nodes. It looks like they take a hash and then do a bunch of stuff. If these are changed to positional parameters that just set IVs (as they are in TinyGQL), then node allocation will be faster.

Of course this will make constructing a nodes more cumbersome, but I'd posit that nobody actually manually allocates a node (it's all done via the parser).

@rmosolgo
Copy link
Owner Author

rmosolgo commented Dec 8, 2023

constructors for nodes

Agreed -- that's kind of what I meant by my last todo item, unintelligible as it was. I think the C Parser will also benefit from that.

I saw how TinyGQL uses .rb.erb to generate code. That's the first time I've seen it but I think I'll give it a try to replace the really complicated metaprogramming that nodes.rb has in it right now.

@tenderlove
Copy link
Contributor

I saw how TinyGQL uses .rb.erb to generate code. That's the first time I've seen it but I think I'll give it a try to replace the really complicated metaprogramming that nodes.rb has in it right now.

Please take any of it you find helpful. The code is all yours! 😄

@rmosolgo
Copy link
Owner Author

I tried the refactor on Node initialization, and maybe it's faster, but it didn't remove the allocation for the hash :S I think I'm learning that, for def initialize, Ruby still makes a hash in order to pass it to Class.new. There's an old report here:

And I wrote a simple script to demonstrate the issue:

require "memory_profiler"
class A; def initialize(a: nil); end; end
report = MemoryProfiler.report do
  A.new(a: 1)
end
puts report.pretty_print
# allocated objects by class
# -----------------------------------
#          1  A
#          1  Hash

report = MemoryProfiler.report do
  A.new
end
puts report.pretty_print
# allocated objects by class
# -----------------------------------
#          1  A

So although simplifying that flow might make things faster, I'll have to get away from keywords in order to get rid of the hash allocations.

@rmosolgo
Copy link
Owner Author

ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [x86_64-darwin22]
Warming up --------------------------------------
           Old Parse   165.000  i/100ms
           New Parse   912.000  i/100ms
             C Parse     1.220k i/100ms
Calculating -------------------------------------
           Old Parse      1.640k (± 5.4%) i/s -      8.250k in   5.046602s
           New Parse      9.026k (± 3.3%) i/s -     45.600k in   5.057824s
             C Parse     12.547k (± 2.2%) i/s -     63.440k in   5.058800s

Comparison:
             C Parse:    12546.8 i/s
           New Parse:     9026.1 i/s - 1.39x  slower
           Old Parse:     1640.0 i/s - 7.65x  slower

Allocations: Old Parse: 1225
Allocations: New Parse: 260
Allocations: C Parse: 443

@rmosolgo
Copy link
Owner Author

Comparing vs. master now that I've replaced the old parser with the new one:

ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [x86_64-darwin22]
 Calculating -------------------------------------
-           Ruby Parse      1.536k (± 2.3%) i/s -      7.803k in   5.083115s
+           Ruby Parse      8.558k (± 2.6%) i/s -     42.800k in   5.004758s
-              C Parse     10.939k (± 1.5%) i/s -     54.800k in   5.010856s
+              C Parse     11.789k (± 6.3%) i/s -     59.731k in   5.090992s



- Allocations: Ruby Parse: 1150
+ Allocations: Ruby Parse: 261
  Allocations: C Parse: 443
  • The Ruby parser is 457% faster and uses 77% fewer objects
  • The C parser is ~8% faster -- presumably because of the Node initialization improvements.

@rmosolgo rmosolgo merged commit 007a1ea into 2.2-dev Dec 11, 2023
12 checks passed
@rmosolgo rmosolgo deleted the recursive-descent-parser branch December 11, 2023 21:25
@@ -15,6 +15,7 @@ def initialize(multiplex: nil, query: nil, **_options)
@query = query
end

# The Ruby parser doesn't call this method (`graphql/c_parser` does.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ the new parser

Wish this behaviour was more widely communicated given the Ruby parser is the default (our CI started failing)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry for the nasty surprise -- at least I added a changelog entry for this change in 7c3d899

@arielvalentin
Copy link

arielvalentin commented Dec 22, 2023

@rmosolgo The OTel Instrumentation test suite started to fail starting with [email protected]. It seems the lex callback is not being invoked in the test suite since the span is missing because I presume we are defaulting to using a Ruby parser instead of the C version of the parser?

I see that this PR includes an environment variable that will run different versions of the parser. This may be something we have to include in our test suite as well.

Would you mind giving me a hand with OTel Instrumentation and seeing what the best way forward is here? open-telemetry/opentelemetry-ruby-contrib#773

@ravangen
Copy link
Contributor

I presume we are defaulting to using a Ruby parser instead of the C version of the parser?

Yes. A client would have to require "graphql/c_parser" to have the implementation switched out.

In terms of tracing tests expecting lex to be called, I ended up with a check like:

def trace_lex_supported?
  return @trace_lex_supported if defined?(@trace_lex_supported)
  @trace_lex_supported = Gem::Requirement.new("< 2.2").satisfied_by?(Gem::Version.new(GraphQL::VERSION))
end

@rmosolgo
Copy link
Owner Author

Thanks for sharing what worked for you, @ravangen, LGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Try a hand-written recursive descent parser
4 participants