-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Convert to Package infrastructure #77
Comments
I have had this thought a few times in the past tbh. Not 100% opposed to it, but also a LITTLE apprehensive so having a huge wrapper function that is called for it's 10000 side-effects Good topic for tech review? |
@jdhoffa @cjyetman to light a fire under this, memory management is kind of a huge issue for data prep, since when I try to run on a machine with 16GB of ram, it's killing with out-of-memory errors (while connecting scnearios to ABCD). Not saying that making a package is a cureall for that, but I think it would help (context: 16GB is the max memory for a standard instance of Azure Container Instances) |
I don't think it would help, the memory intensive step is all already wrapped in a function I believe. |
Unless there's something I'm missing? |
I don't see how "converting to package infrastructure" would solve any memory problems, but it's an interesting experiment. |
Overall, I think all the calls to It's entirely possible that I'm wrong on this one, but I think that even if it doesn't solve the problems, we have enough repeated code (sqlite export, for example) that it's probably worth exploring |
Yeah that's fair, having a |
You could simply wrap things in curly braces and let R do the Personally I think the explicitness is better. |
This has been partially achieved by adding a |
I've had pretty good success with using an R package infrastructure with
{workflow.factset}
, since it makes it pretty easy to maintain sections of the workflow as individual functions (which is nice since dealing with garbage collection is less of an issue, no more calls torm()
to free up memory), but it also makes things easier for dependency management and if there are functions that should be tested, testing them, and documentation.The general idea is that you have a primary function that is meant to be called:
and then define whatever behaviors you need in each child file.
then getting everything ready to go is pretty straightforward for preparing the code, and just calling
workflow.data.preparation::run_pacta_data_prep(config = "/path/to/config.yml")
or something like that. A bit of a simplification, but overall not a terrible conversion.cc @cjyetman @jdhoffa
The text was updated successfully, but these errors were encountered: