Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory management #1337

Open
3 tasks
kddnewton opened this issue Aug 26, 2023 · 7 comments
Open
3 tasks

Memory management #1337

kddnewton opened this issue Aug 26, 2023 · 7 comments
Labels
enhancement New feature or request
Milestone

Comments

@kddnewton
Copy link
Collaborator

At the moment, we allocate memory as we need it, making a single malloc call for every node in the tree, as well as whichever fields on those nodes need memory. When we're done with the tree, we recursively visit the tree and deallocate all at once. This is a really good candidate for arena allocation. But more than that, it would be good enough just to centralize our memory management functions, which we need to do anyway. Below are a couple of tasks that are related to memory that would improve our general memory story.

  • Centralize our memory management functions. Currently we call the stdlib malloc/calloc/realloc/free functions whenever we need memory. (We purposefully avoid alloca because it's not C standard.) Those functions calls are scattered throughout the codebase. For all subsequent bullets, we need them to go through centralized functions. These functions should all accept a yp_parser_t *, but it can be marked as unused for now (YP_ATTRIBUTE_UNUSED). For this first task, they should all simply call the stdlib function. For example, a yp_malloc function would call malloc with the same arguments.
  • Provide the ability to pass memory function pointers to a yp_parser_t such that the yp_ version of the memory functions call the function pointers if they are provided. We need this for CRuby, which provides xmalloc which will attempt a GC if a call to malloc fails. This should follow the same pattern we have for the encoding_changed callback, which is that it provides a function that accepts a function pointer.
  • Request a certain number of pages at a time, estimated by the size of the input string. This would be our own arena allocation. yp_malloc would then change to call malloc/xmalloc only when the current arena has run out of allocatable memory. We can be very naive with this and make it simply a bump allocator since we so rarely deallocate nodes while parsing.

The first two of these tasks are necessary for our integration into CRuby. The last one is a nice-to-have enhancement, and is not necessary for v1.0.

@flavorjones
Copy link
Contributor

flavorjones commented Aug 28, 2023

There is some relevant information about the impact of using ruby_xmalloc on performance at sparklemotion/nokogiri#2843

TL;DR the libxml2 parser was measured as being 1.12x - 1.34x slower when using ruby_xmalloc as compared to stdlib malloc. I'd be interested to see if there's a similar slowdown in YARP.

@eregon
Copy link
Member

eregon commented Aug 28, 2023

I think the best solution here is to only use arena allocations.
And to allocate the arena itself then malloc/xmalloc can be used.
That would provide the best performance and use xmalloc semantics without a significant overhead.

Since this seems performance-critical I think it should be in for 1.0. But also I don't know what is the vision/general requirements for 1.0. I think we need to review and cleanup all nodes for 1.0, as after that it will be expected the AST is fully stable.

@kddnewton
Copy link
Collaborator Author

Yeah @flavorjones I remember you showing me the nokogiri metrics. I think we should build it either way in order to be able to measure it on YARP for sure.

@kddnewton
Copy link
Collaborator Author

@HParker did you want to take a look at the first bullet here?

@HParker
Copy link
Collaborator

HParker commented Aug 28, 2023

Sure! I might have already started looking into it. :)

@kddnewton kddnewton added the enhancement New feature or request label Sep 6, 2023
@kddnewton kddnewton added this to the Ruby 3.4.0 milestone Dec 6, 2023
@kddnewton kddnewton modified the milestones: CRuby unblocked, Long-term Feb 14, 2024
@amomchilov
Copy link
Contributor

@HParker Do you have any findings you'd like to share? I'm looking into adding an arena allocator to RBS, which I'd like to share with whatever ends up being used in Prism. I haven't started yet, and the more I can rip off lovingly copy, the better :)

@HParker
Copy link
Collaborator

HParker commented Jan 9, 2025

Hey @amomchilov 👋

Yeah, my research here so far was largely testing how difficult getting prism working with its own simple arena allocator would be. I saw some, but not massive performance difference.

The place I left off was realizing that finding the relationship between source size and Prism allocation size could lead to better initial arena size which might be where the benefit lives in an area allocator.

I do think there is more research needed to be sure that an arena allocator is worth it for Prism. Since this issue was created we were able to find and fix some memory bugs by using memory sanitization tools. I suspect that with arena memory that might be more difficult. We should consider a plan for that where at least some mode can interact well with Valgrind and ASAN/LSAN.

Feel free to copy everything I came up with so far, but as far as a sophisticated arena allocator, I am very interested in doing that work, but didn't do it yet here.

Hope that is helpful! feel free to reach out more I would love to collaborate more if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants