Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore if fs::file_exists() is more convenient for checking data path existence in main run_pacta_data_preparation.R #119

Closed
jdhoffa opened this issue Feb 10, 2024 · 7 comments

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Feb 10, 2024

Consider if the file.exists() checks in run_pacta_data_preparation.R can be improved to take into account the possibility of the paths specifying a directory instead of a file.

related #118

# check that everything is ready to go -----------------------------------------
stopifnot(file.exists(masterdata_ownership_path))
stopifnot(file.exists(masterdata_debt_path))
stopifnot(file.exists(ar_company_id__factset_entity_id_path))
stopifnot(file.exists(factset_financial_data_path))
stopifnot(file.exists(factset_entity_info_path))
stopifnot(file.exists(factset_entity_financing_data_path))
stopifnot(file.exists(factset_fund_data_path))
stopifnot(file.exists(factset_isin_to_fund_table_path))
stopifnot(file.exists(factset_iss_emissions_data_path))
stopifnot(file.exists(data_prep_outputs_path))
if (!update_currencies) {
stopifnot(file.exists(currencies_data_path))
}

originally from @jdhoffa

Have you checked if this problem persists using functions from https://fs.r-lib.org/ ?

I know there's a hesitance to add dependencies which I totally understand but given that this repository is concerned primarily with file I/O, I don't think depending on a package that facilitates better file I/O in R is a terrible idea

Originally posted by @jdhoffa in #99 (comment)

@cjyetman
Copy link
Member

For reference, the documentation of fs::dir_exists() is here: https://fs.r-lib.org/reference/file_access.html

Currently, the base R equivalent dir.exists() is not used anywhere in this repo. This could be relevant to #118 if it was determined that the need to check for directory existence was needed.

@jdhoffa jdhoffa changed the title Explore if fs::dir_exists() is more convenient for checking data path existence in main run_data_preparation Explore if fs::file_exists() is more convenient for checking data path existence in main run_data_preparation Feb 10, 2024
@jdhoffa
Copy link
Member Author

jdhoffa commented Feb 10, 2024

Title updated.

@cjyetman
Copy link
Member

Generally, these are the primary differences between functions in fs and base R equivalents, according to the maintainers of fs

https://github.com/r-lib/fs?tab=readme-ov-file#comparison-vs-base-equivalents

Comparison vs base equivalents

fs functions smooth over some of the idiosyncrasies of file handling
with base R functions:

  • Vectorization. All fs functions are vectorized, accepting multiple
    paths as input. Base functions are inconsistently vectorized.

  • Predictable return values that always convey a path. All fs
    functions return a character vector of paths, a named integer or a
    logical vector, where the names give the paths. Base return values are
    more varied: they are often logical or contain error codes which
    require downstream processing.

  • Explicit failure. If fs operations fail, they throw an error. Base
    functions tend to generate a warning and a system dependent error
    code. This makes it easy to miss a failure.

  • UTF-8 all the things. fs functions always convert input paths to
    UTF-8 and return results as UTF-8. This gives you path encoding
    consistency across OSes. Base functions rely on the native system
    encoding.

  • Naming convention. fs functions use a consistent naming
    convention. Because base R’s functions were gradually added over time
    there are a number of different conventions used (e.g. path.expand()
    vs normalizePath(); Sys.chmod() vs file.access()).

I am also aware of another significant difference: that the way fs functions deal with ~ on Windows is more consistent.

Specifically to the context that was discussed in #99, the fs equivalent functions have the same behavior...

dir_name <- "~/Desktop/"

isTRUE(file.exists(file.path(dir_name, "dataprep")))
#> [1] TRUE
isTRUE(file.exists(file.path(dir_name, "FOO")))
#> [1] FALSE
isTRUE(file.exists(file.path(dir_name, "")))
#> [1] TRUE
isTRUE(file.exists(file.path(dir_name, NULL)))
#> [1] FALSE


isTRUE(fs::file_exists(file.path(dir_name, "dataprep")))
#> [1] TRUE
isTRUE(fs::file_exists(file.path(dir_name, "FOO")))
#> [1] FALSE
isTRUE(fs::file_exists(file.path(dir_name, "")))
#> [1] TRUE
isTRUE(fs::file_exists(file.path(dir_name, NULL)))
#> [1] FALSE

isTRUE(fs::file_exists(fs::path(dir_name, "dataprep")))
#> [1] TRUE
isTRUE(fs::file_exists(fs::path(dir_name, "FOO")))
#> [1] FALSE
isTRUE(fs::file_exists(fs::path(dir_name, "")))
#> [1] TRUE
isTRUE(fs::file_exists(fs::path(dir_name, NULL)))
#> [1] FALSE

@cjyetman cjyetman changed the title Explore if fs::file_exists() is more convenient for checking data path existence in main run_data_preparation Explore if fs::file_exists() is more convenient for checking data path existence in main run_data_preparation Feb 10, 2024
@jdhoffa
Copy link
Member Author

jdhoffa commented Feb 10, 2024

👌

@jdhoffa jdhoffa closed this as not planned Won't fix, can't repro, duplicate, stale Feb 10, 2024
@cjyetman
Copy link
Member

re-opening with a renewed focus on the file.exists() checks in run_pacta_data_preparation.R

@cjyetman cjyetman reopened this Feb 11, 2024
@cjyetman cjyetman changed the title Explore if fs::file_exists() is more convenient for checking data path existence in main run_data_preparation Explore if fs::file_exists() is more convenient for checking data path existence in main run_pacta_data_preparation.R Feb 11, 2024
@AlexAxthelm
Copy link
Collaborator

I'd like to flag this for further investigation. I've had issues with the fs copy functions when using an SMB mount for Azure File Shares (errors, not warnings), and have been preferring the base functions in my code since then out of an abundance of caution. I'm open to trying the fs functions again, but I'd like to ensure that they'll work in our target environment.

I don't expect the _exists functions to cause problems, but before we go too far down this path, it's worth checking.

@cjyetman
Copy link
Member

Since there are concerns with both fs::file_exists() and base::file.exists(), I lean towards "don't fix what ain't broke". closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants