Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync infrastructure assets from upstream "templates" #291

Merged
merged 7 commits into from
Jun 13, 2024
Merged

Sync infrastructure assets from upstream "templates" #291

merged 7 commits into from
Jun 13, 2024

Conversation

per1234
Copy link
Collaborator

@per1234 per1234 commented Jun 13, 2024

Arduino tooling projects use a standardized infrastructure. A centralized collection of reusable infrastructure assets is maintained in a dedicated repository:

https://github.com/arduino/tooling-project-assets

Since the time this project's infrastructure was installed, some advancements have been made in the upstream "template" assets. The project's infrastructure is hereby brought up to date with the state of the art upstream assets.

There are multiple scopes at which the permissions of the GITHUB_TOKEN access token (which is automatically generated
for use in GitHub Actions workflow runs) can be configured:

- enterprise
- organization
- repository
- workflow
- job

The latter two scopes are configured using the `permissions` workflow key. The point of configuring permissions in the
workflow is that each workflow may have different requirements. Granular configuration means that the "principle of
least privilege" can be more closely followed, by only granting permissions in the specific scopes where they are
needed.

Previously, in cases where the same permissions configuration could be used for all jobs in a workflow, the
configuration was done at the workflow scope. Even if functionally equivalent, I think it is semantically more
appropriate to always set the permissions at the job scope. This more clearly communicates that the intention is to make
the most granular possible permissions configuration. Hopefully that will serve as a model for any additional jobs added
to the workflow in the future and make it more likely that the appropriate permissions configuration will be done there.
… paths

In cases where a project contains distinct components in subfolders, multiple license files might be present.

Previously the "Check License" workflow only supported checking the license file in the root of the repository. Support
for validating an arbitrary number of license files with arbitrary locations, types, and filenames is added. A job
matrix is used to provide this support in a manner that makes application-specific configuration of the workflow easy
and without code duplication.
The project infrastructure validates the package.json npm configuration files against their JSON schema.

Previously, in order to provide validation coverage for all package.json files in any locations in the repository, a
"globstar" was used to cause the validator to recursively search the entire file tree under the repository. That
approach is problematic because the repository contains externally maintained files (e.g., the npm packages under the
node_modules folder). Searching and validating these files is inefficient at best and the cause of spurious failures at
worst.

This is avoided by targeting the search. Support for a repository maintainer to configure any number of specific
locations of npm-managed projects in the "Check npm" workflow has been added, so this system is used to target the
validations. When the `npm:validate` task is ran by a contributor on their local clone, it defaults to the root of the
repository, but the path can be configured by setting the PROJECT_PATH taskfile variable via an argument to the task
invocation command.
The "Check Files" (Task) template includes an asset task named `general:check-filenames` that checks for the presence of
non-portable filenames in the project.

Ironically, the task itself was non-portable. The problem was that it used the `--perl-regexp` flag in the `grep`
command. This flag is not supported by the BSD version of grep used on macOS and BSD machines. This caused the task to
fail spuriously with `grep: unrecognized option '--perl-regexp'` errors when ran on a macOS or BSD machine.

The incompatibility is resolved by changing the `--perl-regexp` flag to `--extended-regexp`. This flag, which is
supported by the BSD and GNU versions of grep, allows the use of the modern and reasonable capable POSIX ERE syntax on
all platforms.

Unfortunately the regular expression used in the previous command relied on one of the additional features only present
in the PCRE syntax. This syntax was used to check for the presence of a range of characters prohibited by the Windows
filename specification:

https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#naming-conventions

> Use any character [...] except for the following:
> - Integer value zero, sometimes referred to as the ASCII NUL character.
> - Characters whose integer representations are in the range from 1 through 31

Due to the nature of these characters, they must be represented by code in the regular expression. This was done using
the `\x{hhh..}` syntax supported by PCRE. Neither that syntax nor any of the equivalent escape patterns are supported by
POSIX ERE. A solution is offered in the GNU grep documentation:

https://www.gnu.org/software/grep/manual/grep.html#Matching-Non_002dASCII-and-Non_002dprintable-Characters

> the command `grep "$(printf '\316\233\t\317\211\n')"` is a portable albeit hard-to-read alternative

As also mentioned there:

> none of these techniques will let you put a null character directly into a command-line pattern

So the range of characters in the pattern can not include NUL. However, it turns out that even the previous command did
not detect this character although it was present by the pattern. So this limitation doesn't result in any regression in
practice.
Since projects often contain many Markdown files and new files may be added or the paths of the existing files changed
frequently, the best approach for validating Markdown files is to search the project file tree recursively for all
Markdown files, with exclusions configured for the paths of any externally maintained files.

The `markdown:check-links` task uses the markdown-link-check tool. This tool does not have a capability for discovering
Markdown files so it is necessary to use the `find` command to discover the files, then pass their paths to the
markdown-link-check tool.

Previously the discovery code used `find` to generate an array of paths, which was iterated over passed individually to
markdown-link-check in a `for` loop. The `for` loop is unnecessary because `find` has an `-exec` flag that can be used
to execute commands using the discovered paths. Although the syntax and behavior of this flag is unintuitive, these
disadvantages that come from its use are outweighed by the benefits of the significant amount of code that can be
replaced by it. Since the `-exec`/`-execdir` flags are already in use in the assets and project infrastructure, the
maintainer will be forced to work with them regardless.
The `markdown:check-links` task uses the markdown-link-check tool. This tool does not have a capability for discovering
Markdown files so it is necessary to use the `find` command to discover the files, then pass their paths to the
markdown-link-check tool.

Since it is managed as a project dependency using npm, the markdown-link-check tool is invoked using npx. Since the
`find` command must be ran in combination with markdown-link-check, it is necessary to use the `--call` flag of npx.
Even though Windows contributors are required to use a POSIX-compliant shell such as Git Bash when working with the
assets, the commands ran via the `--call` flag are executed using the native shell, which means the Windows command
interpreter on a Windows machine even if the task was invoked via a different shell. This causes commands completely
valid for use on a Linux or macOS machine to fail to run on a Windows machine due to the significant differences in the
Windows command interpreter syntax.

During the original development of the task, a reasonably maintainable cross-platform command could not be found.
Lacking a better option the hacky approach was taken of using a conditional to run a different command depending on
whether the task was running on Windows or not, and not using npx for the Windows command. This resulted in a degraded
experience for Windows contributors because they were forced to manually manage the markdown-link-check tool dependency
and make it available in the system path. It also resulted in duplication of the fairly complex code contained in the
task.

Following the elimination of unnecessary complexity in the task code, it became possible to use a single command on all
platforms.

The Windows command interpreter syntax still posed a difficulty even for the simplified command: A beneficial practice,
used throughout the assets, is to break commands into multiple lines to make them and the diffs of their development
easier to read. With a POSIX-compliant shell this is accomplished by escaping the introduced newlines with a backslash.
However, the Windows command interpreter does not recognize this syntax, making the commands formatted in that manner
invalid when the task was ran on a Windows machine. The identified solution was to define the command via a Taskfile
variable. The YAML syntax was carefully chosen to support the use of the familiar backslash escaping syntax, while also
producing in a string that did not contain this non-portable escaping syntax after passing through the YAML parser.
The "npm:validate" task validates the repository's `package.json` npm manifest file against its JSON schema to catch any
problems with its data format.

In order to avoid duplication of content, JSON schemas may reference other schemas via the `$ref` keyword. The
`package.json` schema was recently updated to share resources with some other schema, which caused the validation to
start failing.

The solution is to configure the workflow to download those schemas as well and also to provide their path to the
avj-cli validator via a `-r` flag.
@per1234 per1234 added type: imperfection Perceived defect in any part of project type: enhancement Proposed improvement topic: infrastructure Related to project infrastructure labels Jun 13, 2024
@per1234 per1234 self-assigned this Jun 13, 2024
@per1234 per1234 merged commit abab3f2 into arduino:main Jun 13, 2024
37 checks passed
@per1234 per1234 deleted the update-infra branch June 13, 2024 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: infrastructure Related to project infrastructure type: enhancement Proposed improvement type: imperfection Perceived defect in any part of project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant