Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review invalid ISINs in FactSet data #196

Closed
cjyetman opened this issue Mar 25, 2024 · 4 comments
Closed

review invalid ISINs in FactSet data #196

cjyetman opened this issue Mar 25, 2024 · 4 comments

Comments

@cjyetman
Copy link
Member

cjyetman commented Mar 25, 2024

#> Error in base::tryCatch(base::withCallingHandlers({: 1 assertions failed:
#>  * Variable 'data$isin': must contain only valid ISINs, but has
#>  * additional elements "US78467KA#42", "US85255AA*13", "NOR6778@AB29",
#>  * "US048303E#47", "US313855F*48", "US372460A#28", "CA820439A#42",
#>  * "CA008474E@39", "US941848D#79", "US524908@431", "US00253#AB60",
#>  * "US615369A@49", "US893578A@31", "US57555*AD16", "BEB7227*AB60",
#>  * "US031100A@92", "US628464A#60", "US10468*AE44", …, "US758750B*36",
#>  * and "US15135#BQ49".

AFAIK, those are valid ISINs, they are just ISINs of private securities (venture capital, and things like that). Had a look at this when I was looking into Private equity, it's sorta cool that we have them actually! I would loosen the restrictions on the expectation there personally (but maybe ask Nick for a second opinion?)

But in any case, TM doesn't publicize that it can cover private equity, so I think it's totally fine to just filter them out.

Originally posted by @jdhoffa in #185 (comment)

root_dir <- "~/data/workflow-data-preparation-outputs/2023Q4_20240303T082642Z"

pacta.data.validation::validate_financial_data(
  readRDS(file.path(root_dir, "financial_data.rds"))
)
#> Error in base::tryCatch(base::withCallingHandlers({: 1 assertions failed:
#>  * Variable 'data$isin': must contain only valid ISINs, but has
#>  * additional elements "US78467KA#42", "US85255AA*13", "NOR6778@AB29",
#>  * "US048303E#47", "US313855F*48", "US372460A#28", "CA820439A#42",
#>  * "CA008474E@39", "US941848D#79", "US524908@431", "US00253#AB60",
#>  * "US615369A@49", "US893578A@31", "US57555*AD16", "BEB7227*AB60",
#>  * "US031100A@92", "US628464A#60", "US10468*AE44", …, "US758750B*36",
#>  * and "US15135#BQ49".

pacta.data.validation::validate_abcd_flags_equity(
  readRDS(file.path(root_dir, "abcd_flags_equity.rds"))
)
#> Error in base::tryCatch(base::withCallingHandlers({: 1 assertions failed:
#>  * Variable 'data$isin': must contain only valid ISINs, but has
#>  * additional elements "US78467KA#42", "US85255AA*13", "NOR6778@AB29",
#>  * "US048303E#47", "US313855F*48", "US372460A#28", "CA820439A#42",
#>  * "CA008474E@39", "US941848D#79", "US524908@431", "US00253#AB60",
#>  * "US615369A@49", "US893578A@31", "US57555*AD16", "BEB7227*AB60",
#>  * "US031100A@92", "US628464A#60", "US10468*AE44", …, "US758750B*36",
#>  * and "US15135#BQ49".

AB#10378

@cjyetman
Copy link
Member Author

Because of pacta.portfolio.audit:::overall_validity_flag(), any ISIN/holding like this will ultimately be flagged as "Invalid or missing ISIN" when the "audit" process is run, e.g. workflow.transition.monitor/web_tool_script_1.R

That being said, I exported a sample/test portfolio using all of these invalid ISINs and ran it through web_tool_script_1.R, and it is able to match some financial and entity information to some of them (all are "Bonds", "Others", or "Unclassifiable"), and some of those even have matching assets in the ABCD.

My current inclination is to not change the definition of pacta.data.validation::validate_financial_data() and/or pacta.data.validation::validate_abcd_flags_equity() at the moment, and to not implement those validations here in workflow.data.preparation for the moment. That would better allow the possibility of exploring what usefulness we could get out of matching some financial and entity info to holdings of these ISINs in the future (if we remove these from our dataset now, we're more likely to forget that they are available and never explore it).

There are 2513 of these invalid ISINs in the current 23Q4 FactSet data that we pull. These don't cause a problem with anything, so it doesn't hurt to leave them in. But equally so, I don't believe there's a significant advantage to prioritizing an exploration of what we can do with them.

@jdhoffa thoughts?

@jdhoffa
Copy link
Member

jdhoffa commented Apr 18, 2024

  • These ISINs represent private equity
  • Private equity is not considered to be included in the analysis, so it is totally fine that we flag them as "Invalid ISIN", because for the purposes of CTM/ P4I they ARE invalid ISINs (even if overall they are not)
  • The likelihood that a user is inputting private equity is relatively low BUT even if they do, and they notice that we say "invalid ISIN" I highly doubt they will be surprised

It would be unfortunate to lose validation entirely just because of these handful of ISINs though... I would vastly prefer that you:

  • Update pacta.data.validation::validate_financial_data() to have the following behaviour:
  • Pass if the ISIN is formatted as you currently expect (checksum and all that)
  • Warn if an ISIN of this particular type ((A-Z){2}(numeric/ special char){10})
  • Error only if the ISIN is totally bunk (any other format)

and then implement pacta.data.validation in workflow.data.preparation

That way we will at least get a record of the presence of these ISINs in the logs every time we run data prep.

@cjyetman
Copy link
Member Author

ok... gonna move this over as a task in pacta.data.validation then

@cjyetman
Copy link
Member Author

closing in favor of #222

cjyetman added a commit that referenced this issue Apr 18, 2024
closes #18

Most validation errors originally found (below) have been resolved.
Validation of `financial_data` and `abcd_flags_equity` has been removed
for now and will be added in the future.

investigation issues:
- #196
- #197
- #198
- #198

relevant fixes in pacta.data.validation:
- RMI-PACTA/pacta.data.validation#65
- RMI-PACTA/pacta.data.validation#66
- RMI-PACTA/pacta.data.validation#67
- RMI-PACTA/pacta.data.validation#68

relevant fix in pacta.data.preparation
- RMI-PACTA/pacta.data.preparation#18

validation of `financial_data` and `abcd_flags_equity` has been removed
from this PR, and future intended implementation is tracked here
- #222
- dependent on
RMI-PACTA/pacta.data.validation#69
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants