Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow technically invalid, but likely intentional ISINs with a warning #69

Closed
cjyetman opened this issue Apr 18, 2024 · 5 comments
Closed

Comments

@cjyetman
Copy link
Member

cjyetman commented Apr 18, 2024

  • These ISINs represent private equity
  • Private equity is not considered to be included in the analysis, so it is totally fine that we flag them as "Invalid ISIN", because for the purposes of CTM/ P4I they ARE invalid ISINs (even if overall they are not)
  • The likelihood that a user is inputting private equity is relatively low BUT even if they do, and they notice that we say "invalid ISIN" I highly doubt they will be surprised

It would be unfortunate to lose validation entirely just because of these handful of ISINs though... I would vastly prefer that you:

  • Update pacta.data.validation::validate_financial_data() to have the following behaviour:
  • Pass if the ISIN is formatted as you currently expect (checksum and all that)
  • Warn if an ISIN of this particular type ((A-Z){2}(numeric/ special char){10})
  • Error only if the ISIN is totally bunk (any other format)

and then implement pacta.data.validation in workflow.data.preparation

That way we will at least get a record of the presence of these ISINs in the logs every time we run data prep.

Originally posted by @jdhoffa in RMI-PACTA/workflow.data.preparation#196 (comment)


applies to both:

  • pacta.data.validation::validate_financial_data()
  • pacta.data.validation::validate_abcd_flags_equity()

related:

AB#10854

cjyetman added a commit to RMI-PACTA/workflow.data.preparation that referenced this issue Apr 18, 2024
closes #18

Most validation errors originally found (below) have been resolved.
Validation of `financial_data` and `abcd_flags_equity` has been removed
for now and will be added in the future.

investigation issues:
- #196
- #197
- #198
- #198

relevant fixes in pacta.data.validation:
- RMI-PACTA/pacta.data.validation#65
- RMI-PACTA/pacta.data.validation#66
- RMI-PACTA/pacta.data.validation#67
- RMI-PACTA/pacta.data.validation#68

relevant fix in pacta.data.preparation
- RMI-PACTA/pacta.data.preparation#18

validation of `financial_data` and `abcd_flags_equity` has been removed
from this PR, and future intended implementation is tracked here
- #222
- dependent on
RMI-PACTA/pacta.data.validation#69
@jdhoffa jdhoffa added the ADO label Apr 30, 2024
@cjyetman
Copy link
Member Author

cjyetman commented Jun 3, 2024

I began experimenting with adding a checksum argument to is_valid_isin() to optionally turn off the validation of the checksum while still validating the structure of the passed ISINs before realizing that the set of pseudo-ISINs that triggered this contain non-alphanumeric characters.

I could even further dilute the functionality of is_valid_isin() so that the only thing it checks is that the string is 12 characters long, but at that point I seriously question the utility of it.

@jdhoffa I'm tempted to close this... thoughts?

@jdhoffa
Copy link
Member

jdhoffa commented Jun 7, 2024

From what I remember, this was originally opened while trying to implement the pacta.data.validation package in workflow.data.preparation:
RMI-PACTA/workflow.data.preparation#185

The issue opened here with the errenous ISINs was to track the fact that we never implemented validate_financial_data in RMI-PACTA/workflow.data.preparation#185 (since this blocked us from doing so).

Does closing this issue effectively mean we aim to no longer implement validate_financial_data in workflow.data.preparation?

@cjyetman
Copy link
Member Author

cjyetman commented Jun 7, 2024

Does closing this issue effectively mean we aim to no longer implement validate_financial_data in workflow.data.preparation?

there are at least a few options:

  1. allow all ISINs from FactSet data and do not validate
  2. remove invalid ISINs from FactSet data and then validate
  3. allow all ISINs from FactSet and validate after watering down validation so that it only check if each ISIN has 12 characters in it (this one I think is pointless)

@jdhoffa
Copy link
Member

jdhoffa commented Jun 7, 2024

Fair enough, then sounds like option 2 is your preferred option?

Ok to close this issue so long as there is an appropriate follow-up issue (somewhere) that tracks option 2

@cjyetman
Copy link
Member Author

Ok to close this issue so long as there is an appropriate follow-up issue (somewhere) that tracks option 2

closing with tracking of implementing validation as-is here RMI-PACTA/workflow.data.preparation#222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants