Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make this repo public #9

Closed
1 task done
jdhoffa opened this issue Mar 28, 2023 · 19 comments
Closed
1 task done

Make this repo public #9

jdhoffa opened this issue Mar 28, 2023 · 19 comments

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Mar 28, 2023

  • Be careful, as there is some FactSet data in here (e.g. FactSet issue type symbology, that maps to PACTA investment type)
  • depends on RMI-PACTA/archive.pacta.data.preparation#321

if this lands first, that should put a fire under this https://github.com/RMI-PACTA/pacta.scenario.preparation/issues/123

this will facilitate RMI-PACTA/workflow.data.preparation#102

AB#10435

@cjyetman
Copy link
Member

This might be achievable once RMI-PACTA/archive.pacta.data.preparation#321 is resolved.

@AlexAxthelm
Copy link

With RMI-PACTA/archive.pacta.data.preparation#335 and RMI-PACTA/archive.pacta.data.preparation#336 closed, and RMI-PACTA/archive.pacta.data.preparation#338 waiting on merge, it's probably time to revive this thread 🥳

@cjyetman
Copy link
Member

yup! I'd appreciate thorough consideration of this from @AlexAxthelm and @jdhoffa now that (I think) we've gotten rid of any proprietary FactSet stuff.

I guess a show of consensus here in this issue is adequate, after which I'll flip the switch in the settings.

@jdhoffa
Copy link
Member Author

jdhoffa commented Feb 16, 2024

Well, given that the FacSet data will forever live in the git history, I guess "flipping the switch" isn't really an option? Unless you scrub the history?

@AlexAxthelm thoughts?

@cjyetman
Copy link
Member

Well, given that the FacSet data will forever live in the git history, I guess "flipping the switch" isn't really an option? Unless you scrub the history?

@AlexAxthelm thoughts?

true.... sorry, I should have said that my memory of previous conversations was that we were already keeping this private out of an abundance of caution, but the possibly sensitive data from FactSet was actually very minimal and not too big of a concern, and therefore we thought just getting it off main would be adequate

but also willing to go the "proper" route if that is what is currently preferred

@jdhoffa
Copy link
Member Author

jdhoffa commented Feb 16, 2024

Yes, I also agree that the data LIKELY is fine to be public, and we are being very cautious.

I just wanted to make it clear that if we set the setting to public, we are still exposing that dataset, so we need to ensure we are comfortable with that.

Personally, I am comfortable with that 👍

@cjyetman
Copy link
Member

Yes, I also agree that the data LIKELY is fine to be public, and we are being very cautious.

I just wanted to make it clear that if we set the setting to public, we are still exposing that dataset, so we need to ensure we are comfortable with that.

Personally, I am comfortable with that 👍

Thanks for making it explicit in this conversation

@AlexAxthelm
Copy link

To be clear about the options (and summarize prior discussions about similar repos):

  • Get everything off main, keep things as they are (history unchanged)
  • Hard fork, reset history to current(ish) main as initial commit, put the historied repo as a private archive under a new name (a variant on what we've done with other private to public conversions before)
  • Rewrite history via git filter-repo or BFG, force push
  • [Combo Option] Copy current state to archived private repo, redact the files in question in history via git --filter-branch with a placeholder

Of these the first is the simplest, but does leave the pesky sensitive history available to anyone who wants to inspect. The last option is potentially attractive because most of the history is still available, but would still break any commit-specific references (links, tags can be retagged) since all the SHAs would change (and any signed commits become unsigned, but we don't worry too much about that)

I might be overly cautious, but I'm inclined to go with the second option, and put the current state in pacta.data.preparation.archive or something like that, and start fresh.

cc @hodie for input.

@jdhoffa
Copy link
Member Author

jdhoffa commented Feb 16, 2024

A glorious tech review topic!

@AlexAxthelm
Copy link

Additional note, git clone --depth=1 allows us to keep the same commit SHA for our starting point, while still dropping history:

sh-3.2$ pwd
/Users/aaxthelm/Documents/pacta/pacta.data.preparation

sh-3.2$ git log --oneline -10
9e57e0d (HEAD -> main, origin/main, origin/HEAD) remove old `input` and `output` directories (#339)
1df2d07 remove `factset_manual_pacta_sector_override` (#338)
554c8b0 (origin/reduce-mem) remove `factset_industry_map_bridge` (#336)
0969f1d remove `factset_issue_code_bridge` (#335)
530af7a replace `dplyr::group_indices` with `dplyr::cur_group_id()` (#332)
0d3a49e include information about external software with `get_sessionInfo()` (#330)
6858432 fix spelling mistake :facepalm: (#329)
5f5f2f4 allow for passing a FactSet specific directory to `write_manifest()` (#328)
3865952 Ref new pacta.pkgdown.rmitemplate (#326)
e5931e8 minor formatting fix (#324)

sh-3.2$ git clone --depth=1 [email protected]:RMI-PACTA/pacta.data.preparation.git ~/Downloads/foo
Cloning into '/Users/aaxthelm/Downloads/foo'...
remote: Enumerating objects: 73, done.
remote: Counting objects: 100% (73/73), done.
remote: Compressing objects: 100% (67/67), done.
remote: Total 73 (delta 4), reused 37 (delta 3), pack-reused 0
Receiving objects: 100% (73/73), 839.76 KiB | 1.76 MiB/s, done.
Resolving deltas: 100% (4/4), done.

sh-3.2$ cd !$
cd ~/Downloads/foo

sh-3.2$ pwd
/Users/aaxthelm/Downloads/foo

sh-3.2$ git log --oneline -10
9e57e0d (grafted, HEAD -> main, origin/main, origin/HEAD) remove old `input` and `output` directories (#339)

So we could pick up where we left off and have a clear point of connection between the two repos (we might have done this last time, I can't recall)

@cjyetman
Copy link
Member

To me the history (both commit history and issues/PRs) is important and very useful. I recognize that other things are also important here, but want to make sure that the importance of the history is acknowledged here too.

@jdhoffa
Copy link
Member Author

jdhoffa commented Feb 16, 2024

I have added a Tech Review topic (to be filled in) here

I would suggest we discuss and decide there, so this doesn't just float around with no decision :-)

@cjyetman
Copy link
Member

@AlexAxthelm I'm more interested in getting this done than saving the commit history and issues/PRs... can you help do the "hard fork" process?

@AlexAxthelm
Copy link

AlexAxthelm commented Mar 13, 2024

Sure. The repo is ready?

Steps I'll take:

  • move this repo to RMI-PACTA/archive.pacta.data.preparation
  • shallow-copy current HEAD to a new RMI-PACTA/pacta.data.preparation repo
  • Transfer issues to new repo
  • private archive archive.pacta.data.preparation

@cjyetman
Copy link
Member

great, thanks! we should also transfer all the current issues... would it be feasible to leave RMI-PACTA/archive.pacta.data.preparation public long enough to transfer issues and then make it private?

@AlexAxthelm
Copy link

great, thanks! we should also transfer all the current issues... would it be feasible to leave RMI-PACTA/archive.pacta.data.preparation public long enough to transfer issues and then make it private?

added to my list.

@jdhoffa
Copy link
Member Author

jdhoffa commented Apr 9, 2024

In a discussion with @cjyetman we decided that this process should occur after scenario preparation is public.
Depends on RMI-PACTA/workflow.scenario.preparation#9 and RMI-PACTA/workflow.scenario.preparation#10

@cjyetman
Copy link
Member

@jdhoffa I think I'm ready to pull the trigger on this. maybe we can do it together? or?

I'd like to:

  • archive this repo
  • fork it to a new public repo called pacta.data.preparation (the same name)
  • if possible, temporarily make this repo (after it's archived?) public so that we can transfer issues to the new repo, then make the archive private again

@jdhoffa jdhoffa transferred this issue from another repository Apr 15, 2024
@jdhoffa
Copy link
Member Author

jdhoffa commented Apr 15, 2024

Closed by the existence of this repo
🕺

@jdhoffa jdhoffa closed this as completed Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants