-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use explicit filenames for FactSet files #99
Conversation
This is a bit preemptive because we don't yet know what the timestamped FactSet files should really be for each quarter, but the goal of this PR is to structurally change the config file so that the expectation is that FactSet filenames are explicitly named in each config set. Doing so will facilitate testing and continued development while we wait for the the approved FactSet files to be selected, so given that, I think it's alright to merge this now and I'll make an issue to update the explicit FactSet filenames once they are known (#108). |
I don't really have anything relevant to review here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that the checks for existence will fail if the factset_data_path
exists, but the files that are supposed to be there don't.
workflow.data.preparation/run_pacta_data_preparation.R
Lines 111 to 116 in caa6ccd
stopifnot(file.exists(factset_financial_data_path)) | |
stopifnot(file.exists(factset_entity_info_path)) | |
stopifnot(file.exists(factset_entity_financing_data_path)) | |
stopifnot(file.exists(factset_fund_data_path)) | |
stopifnot(file.exists(factset_isin_to_fund_table_path)) | |
stopifnot(file.exists(factset_iss_emissions_data_path)) |
file.exists()
returns TRUE
for directories that exist, so if it's checking on file.exists(file.path(some_dir, ""))
, it's only checking if some_dir
exists. So if the wrong config is passed in, we'll eventually get errors about those files not existing, but it could bypass these checks (which are meant to cause a fast fail).
Suggest either stengthening the checks, or setting the defaults to NULL
rather than ""
, which will cause file.path()
to return an empty string.
Good point! Even file.exists(file.path("~/Desktop/"))
#> [1] TRUE
file.exists(file.path("~/Desktop/", ""))
#> [1] TRUE
stopifnot(file.exists(file.path("~/Desktop/", "")))
file.exists(file.path("~/Desktop/", NULL))
#> logical(0)
stopifnot(file.exists(file.path("~/Desktop/", NULL)))
test <- NULL
file.exists(file.path("~/Desktop/", test))
#> logical(0)
stopifnot(file.exists(file.path("~/Desktop/", test))) |
I guess we'll need a new section that tests if certain parameters are not factset_financial_data_filename <- "test"
stopifnot(isFALSE(is.null(factset_financial_data_filename) || factset_financial_data_filename == ""))
factset_financial_data_filename <- ""
stopifnot(isFALSE(is.null(factset_financial_data_filename) || factset_financial_data_filename == ""))
#> Error: isFALSE(is.null(factset_financial_data_filename) || factset_financial_data_filename == .... is not TRUE
factset_financial_data_filename <- NULL
stopifnot(isFALSE(is.null(factset_financial_data_filename) || factset_financial_data_filename == ""))
#> Error: isFALSE(is.null(factset_financial_data_filename) || factset_financial_data_filename == .... is not TRUE like # check that parameters are not blank ------------------------------------------
stopifnot(isFALSE(is.null(factset_financial_data_filename) || factset_financial_data_filename == ""))
stopifnot(isFALSE(is.null(factset_entity_info_filename) || factset_entity_info_filename == ""))
stopifnot(isFALSE(is.null(factset_entity_financing_data_filename) || factset_entity_financing_data_filename == ""))
stopifnot(isFALSE(is.null(factset_fund_data_filename) || factset_fund_data_filename == ""))
stopifnot(isFALSE(is.null(factset_isin_to_fund_table_filename) || factset_isin_to_fund_table_filename == ""))
stopifnot(isFALSE(is.null(factset_iss_emissions_data_filename) || factset_iss_emissions_data_filename == "")) |
I think you can simplify the logic a fair bit by just putting the setwd(tempdir())
dir_name <- "foo"
dir.create(dir_name)
dir.exists(dir_name)
#> [1] TRUE
file.exists(dir_name)
#> [1] TRUE
file_one <- "file1.txt"
file.exists(file.path(dir_name, file_one))
#> [1] FALSE
file_two <- ""
file.exists(file.path(dir_name, file_two))
#> [1] TRUE
isFALSE(file.info(file.path(dir_name, file_one))$isdir)
#> [1] FALSE
file_three <- NULL
file.exists(file.path(dir_name, file_three))
#> logical(0)
## Suggested Implementation
isTRUE(file.exists(file.path(dir_name, file_three)))
#> [1] FALSE Created on 2024-02-09 with reprex v2.0.2 Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.2 (2023-10-31)
#> os macOS Sonoma 14.2
#> system aarch64, darwin23.0.0
#> ui unknown
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Belgrade
#> date 2024-02-09
#> pandoc 3.1.7 @ /opt/homebrew/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2)
#> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.2)
#> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2)
#> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
#> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1)
#> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2)
#> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)
#> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.1)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
#> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2)
#> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2)
#> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2)
#>
#> [1] /opt/homebrew/lib/R/4.3/site-library
#> [2] /opt/homebrew/Cellar/r/4.3.2/lib/R/library
#>
#> ────────────────────────────────────────────────────────────────────────────── |
but then it doesn't capture dir_name <- "~/Desktop/"
isTRUE(file.exists(file.path(dir_name, "dataprep")))
#> [1] TRUE
isTRUE(file.exists(file.path(dir_name, "FOO")))
#> [1] FALSE
isTRUE(file.exists(file.path(dir_name, "")))
#> [1] TRUE
isTRUE(file.exists(file.path(dir_name, NULL)))
#> [1] FALSE |
I'm going to remove the parameters check from this PR and track that issue in #118. While they're reasonable, the need has existed for a long time and is not specific to nor created by the changes in this PR. The places where I added/edited parameters to be |
Have you checked if this problem persists using functions from https://fs.r-lib.org/ ? I know there's a hesitance to add dependencies which I totally understand but given that this repository is concerned primarily with file I/O, I don't think depending on a package that facilitates better file I/O in R is a terrible idea |
What problem are you referring to? There's nothing directly I/O related in this PR. |
Apologies I meant "file system"**, not file I/O And maybe I misunderstood. I was just curious, since the base R dir.exists doesn't seem to produce the desired output, if
does? It may not! |
Maybe, but I'm avoiding any checking of directories (or anything for that matter) in this PR. This PR is only to change the structure of the There's a linked issue #118 to add checking for valid input, which has never been done before. |
I was just responding to the above conversation! Which it sounds like you would prefer happens in a new issue :-) |
I will see myself out now haha :-) |
yes, I prefer PRs to be precise and specific |
I don't have the context or knowledge of the new "Factset" datasets to review this safely, so will leave to @AlexAxthelm to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with note that the files referenced here are not the final files we want to use, but this puts the infrastructure in place for when we do have a set of files we like.
Noted. #108 |
depends on #99 This PR computes the FactSet timestamp/s from the filenames of all input FactSet data files, and adds that info to the manifest. Note that it would be possible to use FactSet data files from different timestamps... in which case all unique timestamps would be added to the manifest as an array. In the future, e.g. in #118, we could consider giving an error or warning if all of these timestamps do not match.
depends on #99 not sure why this was being done since the repo gets permanently copied into the Docker image. @jdhoffa? Co-authored-by: Jackson Hoffart <[email protected]>
closes #82
based on a seemingly rough consensus here: #75 (comment)