Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a class for manipulating repo data #20

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

domoscargin
Copy link
Contributor

@domoscargin domoscargin commented Jan 3, 2025

Resolves #13, resolves #12, resolves #10, resolves #9

Example action: https://github.com/alphagov/design-system-github-stats/actions/runs/12678615737

And PR: #23

Changes

Adds vitest for testing

  • Mostly because I wanted to try vitest, and it seems nice and speedy and plays nice with ESM.
  • Tests are a bit patchy at the moment, but I've done a few basic ones

Abstracts Octokit

  • Adds an Octokit helper file

Adds a RepoData class

  • Simplifies the build script and makes it easier to unit test things
  • Responsible for running checks and gathering the repo data

Adds a Result class

  • Stores information on the repo
  • Set up as a class to allow normalisation of data for different outputs

Updates the script behaviour and return data

  • Now writes to file in batches, rather than each time we analysed a repo
  • Uses the RepoData class extensively
  • Uses a free GraphQL query to fetch initial information so the script speed is faster, and we're not close to getting to our rate limit
  • Now searches for ALL package.json files, in case they're not at root, and/or there's more than one of them
    • For lockfiles, we try root and the same directory as each of the package files
  • Returns a more detailed suite of dependency info
    • Will find and output all govuk-frontend versions which are DIRECT dependencies in directDependencies
    • Will find and output all govuk-frontend versions which are INDIRECT dependencies in indirectDependencies

Dead ends

I did a lot of thinking about how to reduce the script time and API calls. The free GraphQL request for created at, pushed at and latest commit I've implemented is the biggest enhancement, and batch writing to files reduces I/O a lot. But there were some dead ends:

  • I investigated using the dependency graph API, but it's a bit underbaked at the moment, and there's no easy way to get the full dependency tree for dependents. I think GitHub are working on this API though, so there may be methods in the future that allow us to skip most of our script.
  • I also looked at the search API as an easier way to easily find govuk-frontend dependencies in relevant files, but the rate limit is severely restrictive (10 calls per minute for code search). Could potentially be more useable if we're doing caching, but probably not a goer.
  • I tried to get the repo tree during the initial GraphQL query. Getting the root file list is a very light cost, but there's no easy way to recurse through the file structure. Whereas the REST API just takes 1 call to give you the full structure.

@domoscargin domoscargin marked this pull request as ready for review January 3, 2025 11:20
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 11 times, most recently from 4d1d55e to 5c95154 Compare January 5, 2025 21:39
@domoscargin
Copy link
Contributor Author

domoscargin commented Jan 6, 2025

🐛 BUG

Parent dependencies: these are all null in the latest dataset. Suspect this is because we send the "this is an indirect dependency" examples straight to finding a lockfile version, which will generally be successful and find the govuk-frontend dependency at top level, rather than digging in the sub-levels. Probably makes sense to just fold the parent dependency in with this data, to be honest, rather than have a special path just for "indirect" dependencies.

SOLVED: 47fec86

@domoscargin domoscargin force-pushed the bk-repo-data-class branch 5 times, most recently from 0871587 to 7aeda71 Compare January 6, 2025 21:52
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 2 times, most recently from e6009c5 to 11c43e4 Compare January 9, 2025 16:22
@domoscargin domoscargin marked this pull request as draft January 9, 2025 19:48
standardjs is having some governance problems. Neostandard is a fork which is compatible with eslint 9.

At least some contributors at standardjs have indicated they want to fold the changes back in when the governance issue is sorted.
- strict equality
- use 'with' instead of 'assert' for import assertions
- No more dangling commas
@domoscargin domoscargin changed the base branch from process-all-data to main January 13, 2025 00:37
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 17 times, most recently from 8b04217 to 81d9fe7 Compare January 16, 2025 10:12
Adds helpers/octokit.mjs for queries to the GitHub API.

getRepo() gets some basic repo data
getTree() gets the repo file tree
getFileContent() gets the text content of a file
getRemainingRateLimit() gets the remaining REST API rate limit
Rather hastily done, and we're not testing the Octokit class just yet, but probably better than nothing
Creates a dated unprocessedItems.json file with any repos that weren't processed.
Adds coverage to tests
Adds the originating branch name to the PR branch name so we can differentiate by originating PRs.
Stores and emits the result. Could be used for validation and normalisation of result data.
@domoscargin domoscargin changed the base branch from bk-linting to main January 16, 2025 17:26
@domoscargin domoscargin marked this pull request as ready for review January 16, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant