Proposal: Convert to Package infrastructure #77

AlexAxthelm · 2024-01-19T11:54:50Z

I've had pretty good success with using an R package infrastructure with {workflow.factset}, since it makes it pretty easy to maintain sections of the workflow as individual functions (which is nice since dealing with garbage collection is less of an issue, no more calls to rm() to free up memory), but it also makes things easier for dependency management and if there are functions that should be tested, testing them, and documentation.

The general idea is that you have a primary function that is meant to be called:

run_pacta_data_prep <- function(cfg){
  check_all_the_files_are_there(cfg$inputs)
  export_currencies(cfg$currencies_path)
  export_index_regions(cfg$regions_path)
  export_financial_data(cfg$factset_financial_data_path, cfg$outputs_path)
  export_blah_blah_blah()
  check_everything_exported_properly()
  put_it_all_in_a_zip_file()
}

and then define whatever behaviors you need in each child file.

then getting everything ready to go is pretty straightforward for preparing the code, and just calling workflow.data.preparation::run_pacta_data_prep(config = "/path/to/config.yml") or something like that. A bit of a simplification, but overall not a terrible conversion.

cc @cjyetman @jdhoffa

The text was updated successfully, but these errors were encountered:

jdhoffa · 2024-01-19T14:01:14Z

I have had this thought a few times in the past tbh. Not 100% opposed to it, but also a LITTLE apprehensive so having a huge wrapper function that is called for it's 10000 side-effects

Good topic for tech review?

AlexAxthelm · 2024-01-21T11:59:35Z

@jdhoffa @cjyetman to light a fire under this, memory management is kind of a huge issue for data prep, since when I try to run on a machine with 16GB of ram, it's killing with out-of-memory errors (while connecting scnearios to ABCD). Not saying that making a package is a cureall for that, but I think it would help

(context: 16GB is the max memory for a standard instance of Azure Container Instances)

jdhoffa · 2024-01-21T13:40:12Z

I don't think it would help, the memory intensive step is all already wrapped in a function I believe.

jdhoffa · 2024-01-21T13:40:27Z

Unless there's something I'm missing?

cjyetman · 2024-01-21T13:43:52Z

I don't see how "converting to package infrastructure" would solve any memory problems, but it's an interesting experiment.

AlexAxthelm · 2024-01-22T10:57:30Z

Overall, I think all the calls to rm() are a pretty big code smell (and that we're leaving some objects on the table that don't need to be there), and letting R's garbage collection deal with cleaning up when a function exits are preferable.

It's entirely possible that I'm wrong on this one, but I think that even if it doesn't solve the problems, we have enough repeated code (sqlite export, for example) that it's probably worth exploring

jdhoffa · 2024-01-22T11:11:49Z

Yeah that's fair, having a main() function seems like an easy next step, that doesn't require us going all the way to Package infrastructure, but allows us to get rid of the rm() calls, which I tend to agree is an improvement.

cjyetman · 2024-01-22T11:40:43Z

You could simply wrap things in curly braces and let R do the rms for you if it makes you feel more confident about it.

Personally I think the explicitness is better.

cjyetman · 2024-02-10T05:59:49Z

This has been partially achieved by adding a DESCRIPTION file in #83, but I doubt we will go any further with this since we will likely rather follow the suggestion in #94 to refactor much of the code in run_pacta_data_preparation.R to pacta.data.preparation.

cjyetman mentioned this issue Feb 9, 2024

add DESCRIPTION file RMI-PACTA/workflow.transition.monitor#231

Merged

cjyetman closed this as completed Feb 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Convert to Package infrastructure #77

Proposal: Convert to Package infrastructure #77

AlexAxthelm commented Jan 19, 2024

jdhoffa commented Jan 19, 2024

AlexAxthelm commented Jan 21, 2024 •

edited

Loading

jdhoffa commented Jan 21, 2024

jdhoffa commented Jan 21, 2024

cjyetman commented Jan 21, 2024

AlexAxthelm commented Jan 22, 2024

jdhoffa commented Jan 22, 2024

cjyetman commented Jan 22, 2024 •

edited

Loading

cjyetman commented Feb 10, 2024

Proposal: Convert to Package infrastructure #77

Proposal: Convert to Package infrastructure #77

Comments

AlexAxthelm commented Jan 19, 2024

jdhoffa commented Jan 19, 2024

AlexAxthelm commented Jan 21, 2024 • edited Loading

jdhoffa commented Jan 21, 2024

jdhoffa commented Jan 21, 2024

cjyetman commented Jan 21, 2024

AlexAxthelm commented Jan 22, 2024

jdhoffa commented Jan 22, 2024

cjyetman commented Jan 22, 2024 • edited Loading

cjyetman commented Feb 10, 2024

AlexAxthelm commented Jan 21, 2024 •

edited

Loading

cjyetman commented Jan 22, 2024 •

edited

Loading