Versioning and releasing strategy #39
Replies: 3 comments
-
The linear history article highlights the use of |
Beta Was this translation helpful? Give feedback.
-
For the frontends, I think there is value in consulting the design/product team on changes to features that will "break" the way people understand the functionality of a feature. Coupled with a changelog, this could help highlight to users when we've changed parts of their workflow- whether that's reorganizing the layout of a data panel or using different events to select a feature, etc. It might be good to actually release major frontend versions more frequently than major api or package versions. |
Beta Was this translation helpful? Give feedback.
-
No major surprises for me in this write-up. This is good documentation of previous conversations. |
Beta Was this translation helpful? Give feedback.
-
Intro
I would like for us to introduce some new tools and processes for how we use git and how we manage versions and releases of our code bases. Decisions made in this discussion will eventually inform changes to other new repos but I think this repo is a good "testing ground" for doing this work. Once we have a process we like implemented here, we can define how it translates to our repos.
Problems
I'll start by highlighting some known pain points with the current status-quo, and providing a rough explanation of possible solutions:
Branching strategy
Most of our "legacy" repos have several long lived branches, usually tied to a particular environment. They all have a
master
ormain
branch for production but they each have some combination of branches likedevelop
,staging
,qa
,data-qa
etc. They were likely set up this way because having a branch-per-environment enables us to easily deploy to separate environment with Heroku and Netlify without having to write any CI/CD pipelines to do so because those tools let you configure them to just deploy certain branches to certain URLs as they are updated. While there is value in that, the downside is that we have to manage all of these branches. Going from a feature branch all the way to production can mean creating and merging as many as 4 pull requests. This creates a lot of admin work for more senior devs and creates an error-prone workflow with far too much reliance on humans.Solution
The solution here is, on paper, relatively simple. In fact, we have already begun to work in this new way in some of our newer repos such a this one,
equity-tool
, andae-zoning-api
. The solution here is to only have one "long lived" branch -main
. Developers will create feature branches off ofmain
and open PRs againstmain
. I say "on paper" because getting rid of the "branch per environment" way of working means we have to write automation to deploy to lower environments programatically. I'll go into more detail as to what that should look like later on but we already took a swing at implementing a very simple version of this inequity-tool
's GH actionsMerging strategy and non-linear histories
By default, GitHub gives us 3 options for how PRs are merged into their target branch:
While admins can disable any of these options in the settings for each repo, our legacy repos tend to have all 3 enabled and, historically, the team has often used merge commits. In some of our newer repos, we've been trying to get aware from merge commits, so they have merge commits disabled, or may even have only rebasing enabled.
Allowing merge commits is a common way that non-linear commits get introduced into
main
. This is undesirable because it makes the commit history harder to read and can make it more difficult to identify which commit introduced a given bug. Another way non-linear commits are introduced are when a developer mergesmain
onto their feature branch to get it up to date with changes made tomain
in the time since they made their branch. The issues related to having non-linear histories are compounded in repos that use the branch-per-environment patterns because the same changes can show up under different commits in the various branches, leading to PR diffs that show "changes" that are only changes in the sense that the commit hashes are different, even though the actual content of the source code is the same.Solution
We should update our GH settings to only allow rebase merging for Pull Requests. While squash-and-merge would also avoid non-linear commits, using this feature in GH allows devs to type out a commit message for the squashed commit in the GH UI, which introduces it's own problems with other tooling related to other issues I'll discuss later. Allowing only rebase gives us the most opportunity for quality control. We should also turn on the branch protection setting offered by GH to enforce linear-commits. This will block merge commits from ever landing in protected branches. Finally, engineers can update their local Git settings to default to rebasing for merges.
Lack of versioning strategy
Most of our repos don't really attempt to associate changes over time with "versions" of the piece of software they make up. For instance, the
version
field in Zola's package.json still says0.0.0
. I think it would be a great step towards becoming a more mature engineering organization to adopt a standard approach for versioning our software.Solution
Inconsistent commit messages
Our team hasn't historically enforced standards for the commit messages written by individual contributors. We have mostly relied on the engineer doing the commit to write commit messages that are succinct, descriptive, and helpful, with PR reviewers occasionally speaking up when they feel a commit message could be better. While I don't necessarily think we should place restrictions on the commit messages that developers make locally for tracking progress, I do think it would be good for us to enforce some standardization for commits that are going to hit main for the following reasons:
See "Commits" under "Using standards to address issues" for more details on Conventional Commits.
Automated quality checks
Most of our repos create some "Checks" on PRs via GitHub actions. In most cases, these checks do things like run the linter and tests. We also enforce most of these things at the local-level with commit hooks, but having checks in GH make sure that changes made locally with
--no-verify
don't hitmain
of the remote repo. As we work on new CI pipelines, we should consider how to expand and standardize these checks for new repos. Here is a list of checks I think we should consider going forwardExisting:
eslint
, but things like running typechecking with typescript and augmenting our linter to also do basic a11y checks could be considered part of this as wellNew:
Using standards to address issues
In this section I'll talk about few standards that can help to address the issues regarding commits and versioning described above. In this context, when I say standard I'm referring to "technical standards". Put another way, these are tech-agnostic "ways of doing things" agreed upon by the industry with well documented specifications.
Commits
The standard for commits we should adopt is Conventional Commits. If you've ever looked at the commit histories for many popular open source repos, you've likely seen commits like this
If that looks familiar to you, odds are you were looking at a project that uses conventional commits. Please read over the summary at their website for details, but at a high level, conventional commits is a standard way of formatting commit messages so that they are machine readable and convey certain information about the changes in the commit, such as:
fix
,feat
, orchore
!
character signifying that the commit is a breaking changeWe, of course, wouldn't ask engineers to write out commit messages that satisfy this formatting "by hand". I'll go into tooling for conventional commits in the "Tooling we can use" section below.
Versioning
The standard we can adopt for versioning is Semantic Versioning (semver for short). Software versions are usually described by 3 numbers separated by
.
s, as in1.2.3
. Semver assigns semantic meaning to each of those three numbers in a way that makes it easy to discern which number should be incremented for a given change. Please read their documentation for more details on what each number means. Using semver means we can clearly communicate the difference between two versions of a piece of software at a glance - such as if a new version contains a bug fix but no new features, or if a new version contains a breaking change. The definitions of these numbers can vary slightly depending on what kind of software a given repo makes up. See ""Types" of repos" for info on that.Fortunately for us, conventional commits and semver integrate well together. Much of the open source tooling around conventional commits also takes into consideration semver.
Where are versions actually tracked?
There are a couple places where the version of a repo is tracked:
version
property ofpackage.json
. This is obviously specific to javascript projects but the ecosystems around other languages generally have some equivalentv2.1.0
or2.1.0
version
inpackage.json
.Note that the version in
package.json
and the versions tracked as git tags should be kept in sync. If someone checks out the git tagv2.1.0
and looks at the version in package.json, they should see2.1.0
.Changelogs
Now that we have standards that govern versions and the commits that make up those version, how do we organize all of that information in a central, human-readable place? This is where changelogs come in. The concept of a "changelog" is just that - a log of changes made to the code base over time. However, there are standards for creating and formatting changelogs in a standardized way that can take advantage of automation tooling. The standard I suggest we use for this is known as "Keep a Changelog". If you've ever seen a
CHANGELOG.md
file in popular open source repos, odds are they were using this or a similar standard.We can set up tooling to automatically update our
CHANGELOG.md
with information taken from our conventional commits and semver versions."Types" of repos
At this point it's worth grouping our repos into a few broad categories and discussing how concepts like Semantic Versioning apply to each in practice. If we look at all of our repos, we can roughly put the vast majority them into three groups representing the type of software that the repo contains:
@nycplanning/streetscape
and a Storybook static site, for our purposes, we are mainly concerned with it as a package.Semver describes the meaning of the three numbers that make up a version as following:
We need to be clear about how these rules relate to each type of repo, with particular attention paid to what constitutes a "breaking change".
PATCH
, new features (such as a new endpoint) areMINOR
, and breaking changes (such as changing the structure of some data returned by an endpoint) areMAJOR
. I would argue that changes that don't change the public API or add new functionality would also be consideredPATCH
. This would include changes that refactor code without changing the public API, add documentation or tests, or even a change that changes the schema of a database table tracked in the code (so long as that change doesn't also change the "public API")PATCH
andMINOR
are relatively straight forward - if you're fixing a bug or refactoring code, you wantPATCH
. If you're adding a new feature to the app, you wantMINOR
. I've seen conflicting opinions as to what should constitute aMAJOR
version for user-facing applications, but there isn't really a hard and fast rule. I suggest we don't worry too much aboutMAJOR
for now. We should have some heuristic for when a user-facing application reaches1.0.0
but, after that, our apps can continue on1.x.x
pretty much forever if we wanted them to. We could also decide to doing aMAJOR
release if we have a set of changes that represent a major overhaul to the application, thereby taking a "you know it when you see it" approach. I'm open to others thoughts on this, but in any case, the "stakes" of failing to mark the release of a user-facing app as "breaking" are basically nonexistent compared to those of a package or API.Getting to 1.0.0
Because we plan to be doing a lot of greenfield development in the near-term, I wanted to call out how we decide when to take a repo from
0.x.x
to1.x.x
. Semver's official site has some guidance on this that I think I agree with mostly. In general, I think we should try to go to 1.0.0 when we think a repo is in a relatively stable state. This is, of course, very vague, but it will be up to us to use our judgement. While there's nothing technically wrong with having a repo that has only existed for 6 months be onv39.x.x
, I think it's something we should try to avoid by keeping repos in0.x.x
until the frequency of breaking changes has plateaued.Semver's FAQ suggests that, if a codebase is being used in production, it should probably be at
1.0.0
. While I like how this sounds on paper, I'm not sure that we need to be that ambitious in practice. If we were building APIs to be consumed by external parties, I would probably say we should follow this rule. However, for most of our projects, I think it's acceptable to have early-stage projects in production while still being0.x.x
. If we find ourselves in that position, we should probably take it as a sign that we should get the repo to1.x.x
sooner rather than later.Tooling we can use
I've done some investigation into the open source tooling available to operationalize all of this. Note that this tooling is a combination of tools to be used locally, as part of each developers local toolchain, or as part of our CI/CD pipelines which for us usually means GitHub Actions.
Commit Tooling
Commitlint
Commitlint is an extensible family of NPM packages that make it easy to:
You can use the
commitlint
CLI to lint commits between two commits. They even have documentation that shows how to use this in GitHub Actions to, for example, lint all of the commits being added by a PR.For authoring commits, it gives two options: use the default @commitlint/prompt-cli package, or using Commitizen via the
@commitlint/cz-commitlint
package. I strongly suggest we go with cz-commitlint for the superior interactive CLI user experience it provides.Commitizen
@commitlint/cz-commitlint
is a thin wrapper around Commitizen that allows us to encapsulate all of the configuration we might need intocommitlint.config.js
. Commitizen gives us a user friendly interactive CLI for writting conventional commit messages.Tools for implementing Semantic Versioning, creating releases, and generating changelogs
As the heading suggests, this section covers tools for a few steps of the release process. This is because many tools in the ecosystem are responsible for a combination of those steps, so it makes sense to discuss them together.
In general, this part of the toolchain is responsible for a few things:
1.0.0
and there is one newfeat
commit with no breaking changes, it should increment the version to1.1.0
. Generally, this means these tools with handle things like updatingversion
inpackage.json
, creating a version commit, and tagging that commit.CHANGELOG.md
file according to a standard like Keep a Changelog. These tools will update that file with contents derived from the versioning and conventional commits.I did a deep dive on tooling for these concerns and looked into tools including:
Weighing pros and cons of each, I think the best one for us to try implementing is
release-it
. Here are some points that led me to that decision:release-please
, in contrast, assumes we're going to do release PRs, which I'm trying to stay away from.Breaking down the work
oof that's a lot to process. We should try to break down this work in a way that makes it possible to implement incrementally, while still meeting our high level goals once it's all said and done. Here are a couple outcomes that we want to eventually satisfy that are important to keep in mind:
that are kept in a central repo but referenced in application repos
Starting local
To help break this work down into digestible chunks, we can focus on implementation of these tools so that they work within the Streetscape repo when used manually by a developer in their local environment. This is opposed to setting up the plumbing we will eventually need to integrate these tools with our CI/CD pipelines in the form of Github Actions. To that end, here are some specific tasks we can take on.
release-it
such that it can be used locally. The initial research for this is covered by Research implementation ofrelease-it
in Streetscape repo #45. This step is considered complete when a developer can userelease it
locally to:package.json
and tag that commit with a git tag of the versionCHANGELOG.md
file with the conventional commit message included in this versionBeta Was this translation helpful? Give feedback.
All reactions