-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.19] Address caching issue in schema compiler that can lead to OOMs #1329
base: series/0.19
Are you sure you want to change the base?
Conversation
The CachedSchemaCompiler.Impl class suffered from a pretty dire problem of only caching what is passed to the schema visitor. However, schemas are often pre-processed before reaching the schema visitors, which, combined with a lack of guarantee on Schema equality (in particular for Enumeration Schemas in 0.18), and global caches posed a risk of OOMs in some cases. This change makes it so that the initial schema passed to the compiler acts as a cache key to a cache that's separate to the one that is typically passed to schema visitors.
protected type Aux[_] | ||
type Cache = CompilationCache[Aux] | ||
type AuxCache = CompilationCache[Aux] | ||
case class Cache(outer: CompilationCache[F], inner: AuxCache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the core of the change : we need an inner cache to be passed to some schema visitor, but we also need an outer cache to prevent schema-preprocessing from being re-applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is it only a problem because the preprocessing doesn't produce deterministic (as per hashCode/equals) outputs?
Do we know which transformation was causing the particular issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's 2 problems : most pernicious one is what you describe, and it's the total function held by enumeration schemas which is at fault. Hence the changes in my other PR.
Less pernicious one is that the it's the input of the schema visitor call that gets cached instead of the input of the schema compiler.
This means that we're not protecting against re-running the inefficient pre-processing of schemas that may occur (like hint masks), which is really bad performance wise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good info, thanks
…g-issue-in-schema-compiler
Closes #1328
PR Checklist (not all items are relevant to all PRs)