Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handrolled Parser #729

Merged
merged 2 commits into from
Jan 16, 2025
Merged

Handrolled Parser #729

merged 2 commits into from
Jan 16, 2025

Conversation

byroot
Copy link
Member

@byroot byroot commented Jan 15, 2025

@byroot
Copy link
Member Author

byroot commented Jan 15, 2025

Alright, the remaining failures are error messages being a bit different.

I haven't tried to profile or optimize yet, but it's already a bit faster:

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   958.000 i/100ms
Calculating -------------------------------------
               after      9.788k (± 0.6%) i/s  (102.17 μs/i) -     49.816k in   5.089729s

Comparison:
              before:     9351.2 i/s
               after:     9787.9 i/s - 1.05x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    97.000 i/100ms
Calculating -------------------------------------
               after    983.509 (± 0.8%) i/s    (1.02 ms/i) -      4.947k in   5.030289s

Comparison:
              before:      857.8 i/s
               after:      983.5 i/s - 1.15x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    44.000 i/100ms
Calculating -------------------------------------
               after    453.758 (± 1.5%) i/s    (2.20 ms/i) -      2.288k in   5.043427s

Comparison:
              before:      405.7 i/s
               after:      453.8 i/s - 1.12x  faster

@byroot
Copy link
Member Author

byroot commented Jan 15, 2025

Actually, there's one correctness bug left, we don't validate that string don't contain unescaped control characters.

That may explain part of the difference.

@byroot byroot force-pushed the handrolled branch 3 times, most recently from aacfff7 to 7ae7644 Compare January 15, 2025 19:32
@byroot
Copy link
Member Author

byroot commented Jan 15, 2025

Yeah, no longer look as good after the fix:

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   874.000 i/100ms
Calculating -------------------------------------
               after      8.695k (± 0.9%) i/s  (115.01 μs/i) -     43.700k in   5.026376s

Comparison:
              before:     9413.8 i/s
               after:     8694.8 i/s - 1.08x  slower


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    91.000 i/100ms
Calculating -------------------------------------
               after    907.893 (± 1.0%) i/s    (1.10 ms/i) -      4.550k in   5.012050s

Comparison:
              before:      900.9 i/s
               after:      907.9 i/s - same-ish: difference falls within error


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    44.000 i/100ms
Calculating -------------------------------------
               after    450.179 (± 0.9%) i/s    (2.22 ms/i) -      2.288k in   5.082934s

Comparison:
              before:      425.8 i/s
               after:      450.2 i/s - 1.06x  faster

But I haven't profiled on anything yet, so I think we can get some speed out of this.

@byroot
Copy link
Member Author

byroot commented Jan 16, 2025

I fixed the last few tests that were failing because of different error messages, and cleaned up the code a bit.

Before merging I'd like to figure out how to get at least on par on activitypub.json, it's not super clear to me why we regressed here, but I suspect it's string parsing, as activitypub is the most string heavy of all of them.

It's really when I added back the (unsigned char)*state->cursor < 0x20 check that we fell behind.

@byroot
Copy link
Member Author

byroot commented Jan 16, 2025

It's really when I added back the (unsigned char)*state->cursor < 0x20 check that we fell behind.

Yep, removing that check (hence no longer being a valid JSON parser), put us back way above:

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after     1.041k i/100ms
Calculating -------------------------------------
               after     10.300k (± 0.4%) i/s   (97.09 μs/i) -     52.050k in   5.053527s

Comparison:
              before:     9436.6 i/s
               after:    10299.9 i/s - 1.09x  faster

@eregon
Copy link
Member

eregon commented Jan 16, 2025

Maybe it's worth storing the result of *state->cursor in some local variable vs reading it 3 times?
I'm not sure the C compiler is able to do that (it might not if e.g. it considers concurrent mutations, restrict could help but is difficult to use correctly).

@byroot
Copy link
Member Author

byroot commented Jan 16, 2025

Maybe it's worth storing the result of *state->cursor in some local variable vs reading it 3 times?

I just tried that, it made the perf slightly worse :/

@byroot
Copy link
Member Author

byroot commented Jan 16, 2025

Lookup tables to the rescue:

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after     1.057k i/100ms
Calculating -------------------------------------
               after     10.587k (± 0.7%) i/s   (94.46 μs/i) -     53.907k in   5.092232s

Comparison:
              before:     9857.2 i/s
               after:    10586.6 i/s - 1.07x  faster

@byroot
Copy link
Member Author

byroot commented Jan 16, 2025

On the 3 main benchmarks:

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after     1.041k i/100ms
Calculating -------------------------------------
               after     10.516k (± 0.7%) i/s   (95.09 μs/i) -     53.091k in   5.048809s

Comparison:
              before:     9804.3 i/s
               after:    10516.1 i/s - 1.07x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   105.000 i/100ms
Calculating -------------------------------------
               after      1.051k (± 0.4%) i/s  (951.49 μs/i) -      5.355k in   5.095268s

Comparison:
              before:      912.2 i/s
               after:     1051.0 i/s - 1.15x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    48.000 i/100ms
Calculating -------------------------------------
               after    491.687 (± 0.2%) i/s    (2.03 ms/i) -      2.496k in   5.076439s

Comparison:
              before:      442.4 i/s
               after:      491.7 i/s - 1.11x  faster

@byroot
Copy link
Member Author

byroot commented Jan 16, 2025

It also progressed on all micro-benchmarks but one:

== Parsing mixed utf8 (5002001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    58.000 i/100ms
Calculating -------------------------------------
               after    580.174 (± 2.2%) i/s    (1.72 ms/i) -      2.900k in   5.001120s

Comparison:
              before:      327.5 i/s
               after:      580.2 i/s - 1.77x  faster


== Parsing mostly utf8 (1668001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    58.000 i/100ms
Calculating -------------------------------------
               after    588.648 (± 1.2%) i/s    (1.70 ms/i) -      2.958k in   5.025750s

Comparison:
              before:      322.2 i/s
               after:      588.6 i/s - 1.83x  faster


== Parsing lots_unescape (40301 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after     1.722k i/100ms
Calculating -------------------------------------
               after     17.227k (± 0.1%) i/s   (58.05 μs/i) -     87.822k in   5.097996s

Comparison:
              before:    15936.8 i/s
               after:    17226.8 i/s - 1.08x  faster


== Parsing small nested array (121 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   164.245k i/100ms
Calculating -------------------------------------
               after      1.699M (± 0.3%) i/s  (588.66 ns/i) -      8.541M in   5.027620s

Comparison:
              before:  1202686.5 i/s
               after:  1698775.5 i/s - 1.41x  faster


== Parsing small hash (65 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   394.315k i/100ms
Calculating -------------------------------------
               after      4.222M (± 0.1%) i/s  (236.87 ns/i) -     21.293M in   5.043606s

Comparison:
              before:  3913405.7 i/s
               after:  4221788.7 i/s - 1.08x  faster


== Parsing test from oj (258 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    52.215k i/100ms
Calculating -------------------------------------
               after    542.246k (± 0.6%) i/s    (1.84 μs/i) -      2.715M in   5.007477s

Comparison:
              before:   558907.5 i/s
               after:   542245.6 i/s - 1.03x  slower

So I think this is good to go. I'll cleanup the git history though.

@byroot byroot marked this pull request as ready for review January 16, 2025 12:47
And get rid of the Ragel parser.

This is 7% faster on activitypub, 15% after on twitter and 11% faster
on citm_catalog.

There might be some more optimization opportunities, I did a quick
optimization pass to fix a regression in string parsing, but other
than that I haven't dug much in performance.
@byroot byroot merged commit 7ac73b4 into ruby:master Jan 16, 2025
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants