Versioning and releasing strategy #39

TylerMatteo · 2023-12-16T15:54:49Z

TylerMatteo
Dec 16, 2023
Maintainer

Intro

I would like for us to introduce some new tools and processes for how we use git and how we manage versions and releases of our code bases. Decisions made in this discussion will eventually inform changes to other new repos but I think this repo is a good "testing ground" for doing this work. Once we have a process we like implemented here, we can define how it translates to our repos.

Problems

I'll start by highlighting some known pain points with the current status-quo, and providing a rough explanation of possible solutions:

Branching strategy

Most of our "legacy" repos have several long lived branches, usually tied to a particular environment. They all have a master or main branch for production but they each have some combination of branches like develop, staging, qa, data-qa etc. They were likely set up this way because having a branch-per-environment enables us to easily deploy to separate environment with Heroku and Netlify without having to write any CI/CD pipelines to do so because those tools let you configure them to just deploy certain branches to certain URLs as they are updated. While there is value in that, the downside is that we have to manage all of these branches. Going from a feature branch all the way to production can mean creating and merging as many as 4 pull requests. This creates a lot of admin work for more senior devs and creates an error-prone workflow with far too much reliance on humans.

Solution

The solution here is, on paper, relatively simple. In fact, we have already begun to work in this new way in some of our newer repos such a this one, equity-tool, and ae-zoning-api. The solution here is to only have one "long lived" branch - main. Developers will create feature branches off of main and open PRs against main. I say "on paper" because getting rid of the "branch per environment" way of working means we have to write automation to deploy to lower environments programatically. I'll go into more detail as to what that should look like later on but we already took a swing at implementing a very simple version of this in equity-tool's GH actions

Merging strategy and non-linear histories

By default, GitHub gives us 3 options for how PRs are merged into their target branch:

Merge commit
Squash and merge
Rebase

While admins can disable any of these options in the settings for each repo, our legacy repos tend to have all 3 enabled and, historically, the team has often used merge commits. In some of our newer repos, we've been trying to get aware from merge commits, so they have merge commits disabled, or may even have only rebasing enabled.

Allowing merge commits is a common way that non-linear commits get introduced into main. This is undesirable because it makes the commit history harder to read and can make it more difficult to identify which commit introduced a given bug. Another way non-linear commits are introduced are when a developer merges main onto their feature branch to get it up to date with changes made to main in the time since they made their branch. The issues related to having non-linear histories are compounded in repos that use the branch-per-environment patterns because the same changes can show up under different commits in the various branches, leading to PR diffs that show "changes" that are only changes in the sense that the commit hashes are different, even though the actual content of the source code is the same.

Solution

We should update our GH settings to only allow rebase merging for Pull Requests. While squash-and-merge would also avoid non-linear commits, using this feature in GH allows devs to type out a commit message for the squashed commit in the GH UI, which introduces it's own problems with other tooling related to other issues I'll discuss later. Allowing only rebase gives us the most opportunity for quality control. We should also turn on the branch protection setting offered by GH to enforce linear-commits. This will block merge commits from ever landing in protected branches. Finally, engineers can update their local Git settings to default to rebasing for merges.

Lack of versioning strategy

Most of our repos don't really attempt to associate changes over time with "versions" of the piece of software they make up. For instance, the version field in Zola's package.json still says 0.0.0. I think it would be a great step towards becoming a more mature engineering organization to adopt a standard approach for versioning our software.

Solution

Adopt Semantic Versioning. See "Versioning" under "Using standards to address issues" below for details
Document how semver relates to different kinds of repos we have. See ""Types" of repos" section for details

Inconsistent commit messages

Our team hasn't historically enforced standards for the commit messages written by individual contributors. We have mostly relied on the engineer doing the commit to write commit messages that are succinct, descriptive, and helpful, with PR reviewers occasionally speaking up when they feel a commit message could be better. While I don't necessarily think we should place restrictions on the commit messages that developers make locally for tracking progress, I do think it would be good for us to enforce some standardization for commits that are going to hit main for the following reasons:

It helps to "say what good looks like". Having a standard makes it clear to engineers what is expected of them ahead of time, thereby improving productivity
It can enable automations. The approach I suggest we adopt is a standard known as conventional commits. This means are commits will be machine readable, and the data contained in them can feed into automation tools I'll get into in the "Tooling we can use" section below.
For posterity - adopting a standard will lead to just plain better commit messages in the long term, which will help DCP engineers maintaining our code in the future to do their jobs more easily.

See "Commits" under "Using standards to address issues" for more details on Conventional Commits.

Automated quality checks

Most of our repos create some "Checks" on PRs via GitHub actions. In most cases, these checks do things like run the linter and tests. We also enforce most of these things at the local-level with commit hooks, but having checks in GH make sure that changes made locally with --no-verify don't hit main of the remote repo. As we work on new CI pipelines, we should consider how to expand and standardize these checks for new repos. Here is a list of checks I think we should consider going forward

Existing:

Linting - For the vast majority of our code, this is running eslint, but things like running typechecking with typescript and augmenting our linter to also do basic a11y checks could be considered part of this as well
Tests - Running all unit, e2e, and integration tests on the repo, enforcing that they pass.

New:

Test coverage - This would look different depending on the nature of the repo (say, backend vs frontend), but I think it would helpful for us to adopt some sort of minimum test coverage percentage for new commits. While test coverage can be a blunt instrument, blunt instruments can help filter not-ready-for-primetime branches for poor sleepy senior devs who have lots of PRs to review ;)
Commit message linting - This check would enforce that all commits on the PR comply with conventional commits

Using standards to address issues

In this section I'll talk about few standards that can help to address the issues regarding commits and versioning described above. In this context, when I say standard I'm referring to "technical standards". Put another way, these are tech-agnostic "ways of doing things" agreed upon by the industry with well documented specifications.

Commits

The standard for commits we should adopt is Conventional Commits. If you've ever looked at the commit histories for many popular open source repos, you've likely seen commits like this

feat: allow provided config object to extend other configs

If that looks familiar to you, odds are you were looking at a project that uses conventional commits. Please read over the summary at their website for details, but at a high level, conventional commits is a standard way of formatting commit messages so that they are machine readable and convey certain information about the changes in the commit, such as:

A type for the commit, such as fix, feat, or chore
A word for the feature that the commit concerns
A brief description of the commit meant to be read by fellow humans. This portion is probably most similar to what devs will be used to writing for the whole commit message
An optional ! character signifying that the commit is a breaking change

We, of course, wouldn't ask engineers to write out commit messages that satisfy this formatting "by hand". I'll go into tooling for conventional commits in the "Tooling we can use" section below.

Versioning

The standard we can adopt for versioning is Semantic Versioning (semver for short). Software versions are usually described by 3 numbers separated by .s, as in 1.2.3. Semver assigns semantic meaning to each of those three numbers in a way that makes it easy to discern which number should be incremented for a given change. Please read their documentation for more details on what each number means. Using semver means we can clearly communicate the difference between two versions of a piece of software at a glance - such as if a new version contains a bug fix but no new features, or if a new version contains a breaking change. The definitions of these numbers can vary slightly depending on what kind of software a given repo makes up. See ""Types" of repos" for info on that.

Fortunately for us, conventional commits and semver integrate well together. Much of the open source tooling around conventional commits also takes into consideration semver.

Where are versions actually tracked?

There are a couple places where the version of a repo is tracked:

In the version property of package.json. This is obviously specific to javascript projects but the ecosystems around other languages generally have some equivalent
In git using tags. For example, if a given commit is version 2.1.0, then that commit should be tagged as v2.1.0 or 2.1.0
For repos that publish packages to npm or another package registry, such as this one, each publish of the package to the registry will be associated with a version. In the JS npm ecosystem, the version published to the registry is governed by the version in package.json.

Note that the version in package.json and the versions tracked as git tags should be kept in sync. If someone checks out the git tag v2.1.0 and looks at the version in package.json, they should see 2.1.0.

Changelogs

Now that we have standards that govern versions and the commits that make up those version, how do we organize all of that information in a central, human-readable place? This is where changelogs come in. The concept of a "changelog" is just that - a log of changes made to the code base over time. However, there are standards for creating and formatting changelogs in a standardized way that can take advantage of automation tooling. The standard I suggest we use for this is known as "Keep a Changelog". If you've ever seen a CHANGELOG.md file in popular open source repos, odds are they were using this or a similar standard.

We can set up tooling to automatically update our CHANGELOG.md with information taken from our conventional commits and semver versions.

"Types" of repos

At this point it's worth grouping our repos into a few broad categories and discussing how concepts like Semantic Versioning apply to each in practice. If we look at all of our repos, we can roughly put the vast majority them into three groups representing the type of software that the repo contains:

Frontends - User-facing web applications. This includes both SPAs that are deployed as sets of static files as well as sites that use Server Side Generation and deploy a server application
Backends - Any server-side application. For us, this usually means an API that a frontend calls to get data in the form of JSON
Packages - This is any repo that builds a software package that is meant to be installed as a dependency in other projects. While this repo technically outputs both a package - @nycplanning/streetscape and a Storybook static site, for our purposes, we are mainly concerned with it as a package.

There is one other "type" of repos we have that I want to mention, but won't go into detail quite yet on how versioning applies to it. This other type occurs when we have a repo (or part of a repo) that essentially represents an ETL process. This is code such as ose-equity-tool-etl that downloads, processes, and outputs some sort of data product. We definitely need to have a separate discussion on how this code is organized and versioned, but I'd like to keep this discussion scoped to the three main types described above.

Semver describes the meaning of the three numbers that make up a version as following:

Given a version number MAJOR.MINOR.PATCH, increment the:

1. MAJOR version when you make incompatible API changes
2. MINOR version when you add functionality in a backward compatible manner
3. PATCH version when you make backward compatible bug fixes

It should be noted that these rules are meant to apply to releases >= 1.0.0. Semver's FAQ has some helpful tips about how to use 0.x.x versions to mitigate concerns about young projects having tons of breaking changes early in their life span.

We need to be clear about how these rules relate to each type of repo, with particular attention paid to what constitutes a "breaking change".

Backends - It should be pretty clear for backends, so long as we keep in mind that the rules are meant to be taken from the perspective of a consumer of the API. That is to say, someone calling the API's endpoints. Backwards compatible bug fixes are PATCH, new features (such as a new endpoint) are MINOR, and breaking changes (such as changing the structure of some data returned by an endpoint) are MAJOR. I would argue that changes that don't change the public API or add new functionality would also be considered PATCH. This would include changes that refactor code without changing the public API, add documentation or tests, or even a change that changes the schema of a database table tracked in the code (so long as that change doesn't also change the "public API")
Packages - This one should also be pretty clear, as packages are the primary use case for semver. When considering what constitutes a "breaking change" for a package like this one that published TS functions and React components, I think it's helpful to ask yourself "What is the public-facing API of this package?". Put another way "does my change necessitate consumers of this package to change their code?". For example, removing a prop that can be passed to a component published by this package would be a breaking change, because consumers of this package would have to change their code to not pass the removed prop
Frontends - This one is harder to pin down. I think PATCH and MINOR are relatively straight forward - if you're fixing a bug or refactoring code, you want PATCH. If you're adding a new feature to the app, you want MINOR. I've seen conflicting opinions as to what should constitute a MAJOR version for user-facing applications, but there isn't really a hard and fast rule. I suggest we don't worry too much about MAJOR for now. We should have some heuristic for when a user-facing application reaches 1.0.0 but, after that, our apps can continue on 1.x.x pretty much forever if we wanted them to. We could also decide to doing a MAJOR release if we have a set of changes that represent a major overhaul to the application, thereby taking a "you know it when you see it" approach. I'm open to others thoughts on this, but in any case, the "stakes" of failing to mark the release of a user-facing app as "breaking" are basically nonexistent compared to those of a package or API.

Getting to 1.0.0

Because we plan to be doing a lot of greenfield development in the near-term, I wanted to call out how we decide when to take a repo from 0.x.x to 1.x.x. Semver's official site has some guidance on this that I think I agree with mostly. In general, I think we should try to go to 1.0.0 when we think a repo is in a relatively stable state. This is, of course, very vague, but it will be up to us to use our judgement. While there's nothing technically wrong with having a repo that has only existed for 6 months be on v39.x.x, I think it's something we should try to avoid by keeping repos in 0.x.x until the frequency of breaking changes has plateaued.

Semver's FAQ suggests that, if a codebase is being used in production, it should probably be at 1.0.0. While I like how this sounds on paper, I'm not sure that we need to be that ambitious in practice. If we were building APIs to be consumed by external parties, I would probably say we should follow this rule. However, for most of our projects, I think it's acceptable to have early-stage projects in production while still being 0.x.x. If we find ourselves in that position, we should probably take it as a sign that we should get the repo to 1.x.x sooner rather than later.

Tooling we can use

I've done some investigation into the open source tooling available to operationalize all of this. Note that this tooling is a combination of tools to be used locally, as part of each developers local toolchain, or as part of our CI/CD pipelines which for us usually means GitHub Actions.

Commit Tooling

Commitlint

Commitlint is an extensible family of NPM packages that make it easy to:

lint commit messages so enforce that they adhere to the Conventional Commit spec
author commit messages that adhere to the CC spec

You can use the commitlint CLI to lint commits between two commits. They even have documentation that shows how to use this in GitHub Actions to, for example, lint all of the commits being added by a PR.

For authoring commits, it gives two options: use the default @commitlint/prompt-cli package, or using Commitizen via the @commitlint/cz-commitlint package. I strongly suggest we go with cz-commitlint for the superior interactive CLI user experience it provides.

Commitlint also documents how to set up a git hook to lint commit message as they are made, but we may decide against using that. The reason being that developers may want the ability to write non-CC compliant commits as they complete their work with the intention of squashing and/or rewriting commits once they are ready to put up a PR. Regardless, it is important that we enforce CC-compliant commits within our CI.

Commitizen

@commitlint/cz-commitlint is a thin wrapper around Commitizen that allows us to encapsulate all of the configuration we might need into commitlint.config.js. Commitizen gives us a user friendly interactive CLI for writting conventional commit messages.

Tools for implementing Semantic Versioning, creating releases, and generating changelogs

As the heading suggests, this section covers tools for a few steps of the release process. This is because many tools in the ecosystem are responsible for a combination of those steps, so it makes sense to discuss them together.

In general, this part of the toolchain is responsible for a few things:

Determining how the version should be incremented based on the net effect of the conventional commits that have been created since the previous release. For instance, if the project is at 1.0.0 and there is one new feat commit with no breaking changes, it should increment the version to 1.1.0. Generally, this means these tools with handle things like updating version in package.json, creating a version commit, and tagging that commit.
Creating and automatically updating a CHANGELOG.md file according to a standard like Keep a Changelog. These tools will update that file with contents derived from the versioning and conventional commits.
In cases where the code is a package, such as how this code is an NPM package, it may be responsible for publishing the new version to NPM.

I did a deep dive on tooling for these concerns and looked into tools including:

Weighing pros and cons of each, I think the best one for us to try implementing is release-it. Here are some points that led me to that decision:

Well used and battle-tested. It seems to be popular enough that there is going to be solid support from the OSS community. Their readme even lists several big packages that use it including Axios and Redux.
It is extensible. It has an "ecosystem" of plugins for integrating it with other tools such as tools for Conventional Commits
Well documented. The readme and additional docs seem to cover just about every use case I could think of
Integrates with other tools without tight coupling. Their documentation shows how the tool can be used with other tooling we use such as Github Actions and Releases, without requiring them.
Seems to support the workflow we're aiming for out of the box. release-please, in contrast, assumes we're going to do release PRs, which I'm trying to stay away from.

Breaking down the work

oof that's a lot to process. We should try to break down this work in a way that makes it possible to implement incrementally, while still meeting our high level goals once it's all said and done. Here are a couple outcomes that we want to eventually satisfy that are important to keep in mind:

The new code we write for putting all of this into action should eventually be re-usable. This will take a few forms depending on what part of the process we're concerned with. For example, it could be incorporated into new project templates we make so that we already have plumbing set up when we make new repos. It could also include reusable GH Action workflows
that are kept in a central repo but referenced in application repos
It should satisfy the varying requirements of different types of repos. Things like GH Actions will eventually need to be paramterized or customized so that they work as well for frontend repos as they do for packages, backends, etc.

Starting local

To help break this work down into digestible chunks, we can focus on implementation of these tools so that they work within the Streetscape repo when used manually by a developer in their local environment. This is opposed to setting up the plumbing we will eventually need to integrate these tools with our CI/CD pipelines in the form of Github Actions. To that end, here are some specific tasks we can take on.

Installing and configure conventional commit tooling with commitlint and commitizen - Captured by Set up local conventional commit tooling #41. This tooling allows developers to create and "lint" commit messages such that they conform to the conventional commits standard.
Install and configure release-it such that it can be used locally. The initial research for this is covered by Research implementation of release-it in Streetscape repo #45. This step is considered complete when a developer can use release it locally to:
- Run code quality checks like linting and typechecking before doing any of the following steps
- increment the package version to the correct next version following Semantic Versioning based on the contents of the conventional commits since the last release. This should update the version in package.json and tag that commit with a git tag of the version
- Update a CHANGELOG.md file with the conventional commit message included in this version
- Publish the new version of the package to NPM

TangoYankee · 2023-12-19T20:06:25Z

TangoYankee
Dec 19, 2023
Collaborator

The linear history article highlights the use of –force-with-lease which is worth investigating as a best practice.

0 replies

TangoYankee · 2023-12-19T20:23:06Z

TangoYankee
Dec 19, 2023
Collaborator

For the frontends, I think there is value in consulting the design/product team on changes to features that will "break" the way people understand the functionality of a feature. Coupled with a changelog, this could help highlight to users when we've changed parts of their workflow- whether that's reorganizing the layout of a data panel or using different events to select a feature, etc.

It might be good to actually release major frontend versions more frequently than major api or package versions.

0 replies

TangoYankee · 2023-12-19T20:23:43Z

TangoYankee
Dec 19, 2023
Collaborator

No major surprises for me in this write-up. This is good documentation of previous conversations.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Versioning and releasing strategy #39

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Versioning and releasing strategy #39

TylerMatteo Dec 16, 2023 Maintainer

Intro

Problems

Branching strategy

Solution

Merging strategy and non-linear histories

Solution

Lack of versioning strategy

Solution

Inconsistent commit messages

Automated quality checks

Using standards to address issues

Commits

Versioning

Where are versions actually tracked?

Changelogs

"Types" of repos

Getting to 1.0.0

Tooling we can use

Commit Tooling

Commitlint

Commitizen

Tools for implementing Semantic Versioning, creating releases, and generating changelogs

Breaking down the work

Starting local

Replies: 3 comments

TangoYankee Dec 19, 2023 Collaborator

TangoYankee Dec 19, 2023 Collaborator

TangoYankee Dec 19, 2023 Collaborator

TylerMatteo
Dec 16, 2023
Maintainer

TangoYankee
Dec 19, 2023
Collaborator

TangoYankee
Dec 19, 2023
Collaborator

TangoYankee
Dec 19, 2023
Collaborator