Sync infrastructure assets from upstream "templates" #291

per1234 · 2024-06-13T21:54:05Z

Arduino tooling projects use a standardized infrastructure. A centralized collection of reusable infrastructure assets is maintained in a dedicated repository:

https://github.com/arduino/tooling-project-assets

Since the time this project's infrastructure was installed, some advancements have been made in the upstream "template" assets. The project's infrastructure is hereby brought up to date with the state of the art upstream assets.

There are multiple scopes at which the permissions of the GITHUB_TOKEN access token (which is automatically generated for use in GitHub Actions workflow runs) can be configured: - enterprise - organization - repository - workflow - job The latter two scopes are configured using the `permissions` workflow key. The point of configuring permissions in the workflow is that each workflow may have different requirements. Granular configuration means that the "principle of least privilege" can be more closely followed, by only granting permissions in the specific scopes where they are needed. Previously, in cases where the same permissions configuration could be used for all jobs in a workflow, the configuration was done at the workflow scope. Even if functionally equivalent, I think it is semantically more appropriate to always set the permissions at the job scope. This more clearly communicates that the intention is to make the most granular possible permissions configuration. Hopefully that will serve as a model for any additional jobs added to the workflow in the future and make it more likely that the appropriate permissions configuration will be done there.

… paths In cases where a project contains distinct components in subfolders, multiple license files might be present. Previously the "Check License" workflow only supported checking the license file in the root of the repository. Support for validating an arbitrary number of license files with arbitrary locations, types, and filenames is added. A job matrix is used to provide this support in a manner that makes application-specific configuration of the workflow easy and without code duplication.

The project infrastructure validates the package.json npm configuration files against their JSON schema. Previously, in order to provide validation coverage for all package.json files in any locations in the repository, a "globstar" was used to cause the validator to recursively search the entire file tree under the repository. That approach is problematic because the repository contains externally maintained files (e.g., the npm packages under the node_modules folder). Searching and validating these files is inefficient at best and the cause of spurious failures at worst. This is avoided by targeting the search. Support for a repository maintainer to configure any number of specific locations of npm-managed projects in the "Check npm" workflow has been added, so this system is used to target the validations. When the `npm:validate` task is ran by a contributor on their local clone, it defaults to the root of the repository, but the path can be configured by setting the PROJECT_PATH taskfile variable via an argument to the task invocation command.

The "Check Files" (Task) template includes an asset task named `general:check-filenames` that checks for the presence of non-portable filenames in the project. Ironically, the task itself was non-portable. The problem was that it used the `--perl-regexp` flag in the `grep` command. This flag is not supported by the BSD version of grep used on macOS and BSD machines. This caused the task to fail spuriously with `grep: unrecognized option '--perl-regexp'` errors when ran on a macOS or BSD machine. The incompatibility is resolved by changing the `--perl-regexp` flag to `--extended-regexp`. This flag, which is supported by the BSD and GNU versions of grep, allows the use of the modern and reasonable capable POSIX ERE syntax on all platforms. Unfortunately the regular expression used in the previous command relied on one of the additional features only present in the PCRE syntax. This syntax was used to check for the presence of a range of characters prohibited by the Windows filename specification: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#naming-conventions > Use any character [...] except for the following: > - Integer value zero, sometimes referred to as the ASCII NUL character. > - Characters whose integer representations are in the range from 1 through 31 Due to the nature of these characters, they must be represented by code in the regular expression. This was done using the `\x{hhh..}` syntax supported by PCRE. Neither that syntax nor any of the equivalent escape patterns are supported by POSIX ERE. A solution is offered in the GNU grep documentation: https://www.gnu.org/software/grep/manual/grep.html#Matching-Non_002dASCII-and-Non_002dprintable-Characters > the command `grep "$(printf '\316\233\t\317\211\n')"` is a portable albeit hard-to-read alternative As also mentioned there: > none of these techniques will let you put a null character directly into a command-line pattern So the range of characters in the pattern can not include NUL. However, it turns out that even the previous command did not detect this character although it was present by the pattern. So this limitation doesn't result in any regression in practice.

Since projects often contain many Markdown files and new files may be added or the paths of the existing files changed frequently, the best approach for validating Markdown files is to search the project file tree recursively for all Markdown files, with exclusions configured for the paths of any externally maintained files. The `markdown:check-links` task uses the markdown-link-check tool. This tool does not have a capability for discovering Markdown files so it is necessary to use the `find` command to discover the files, then pass their paths to the markdown-link-check tool. Previously the discovery code used `find` to generate an array of paths, which was iterated over passed individually to markdown-link-check in a `for` loop. The `for` loop is unnecessary because `find` has an `-exec` flag that can be used to execute commands using the discovered paths. Although the syntax and behavior of this flag is unintuitive, these disadvantages that come from its use are outweighed by the benefits of the significant amount of code that can be replaced by it. Since the `-exec`/`-execdir` flags are already in use in the assets and project infrastructure, the maintainer will be forced to work with them regardless.

The `markdown:check-links` task uses the markdown-link-check tool. This tool does not have a capability for discovering Markdown files so it is necessary to use the `find` command to discover the files, then pass their paths to the markdown-link-check tool. Since it is managed as a project dependency using npm, the markdown-link-check tool is invoked using npx. Since the `find` command must be ran in combination with markdown-link-check, it is necessary to use the `--call` flag of npx. Even though Windows contributors are required to use a POSIX-compliant shell such as Git Bash when working with the assets, the commands ran via the `--call` flag are executed using the native shell, which means the Windows command interpreter on a Windows machine even if the task was invoked via a different shell. This causes commands completely valid for use on a Linux or macOS machine to fail to run on a Windows machine due to the significant differences in the Windows command interpreter syntax. During the original development of the task, a reasonably maintainable cross-platform command could not be found. Lacking a better option the hacky approach was taken of using a conditional to run a different command depending on whether the task was running on Windows or not, and not using npx for the Windows command. This resulted in a degraded experience for Windows contributors because they were forced to manually manage the markdown-link-check tool dependency and make it available in the system path. It also resulted in duplication of the fairly complex code contained in the task. Following the elimination of unnecessary complexity in the task code, it became possible to use a single command on all platforms. The Windows command interpreter syntax still posed a difficulty even for the simplified command: A beneficial practice, used throughout the assets, is to break commands into multiple lines to make them and the diffs of their development easier to read. With a POSIX-compliant shell this is accomplished by escaping the introduced newlines with a backslash. However, the Windows command interpreter does not recognize this syntax, making the commands formatted in that manner invalid when the task was ran on a Windows machine. The identified solution was to define the command via a Taskfile variable. The YAML syntax was carefully chosen to support the use of the familiar backslash escaping syntax, while also producing in a string that did not contain this non-portable escaping syntax after passing through the YAML parser.

The "npm:validate" task validates the repository's `package.json` npm manifest file against its JSON schema to catch any problems with its data format. In order to avoid duplication of content, JSON schemas may reference other schemas via the `$ref` keyword. The `package.json` schema was recently updated to share resources with some other schema, which caused the validation to start failing. The solution is to configure the workflow to download those schemas as well and also to provide their path to the avj-cli validator via a `-r` flag.

per1234 added 7 commits June 13, 2024 14:49

per1234 added type: imperfection Perceived defect in any part of project type: enhancement Proposed improvement topic: infrastructure Related to project infrastructure labels Jun 13, 2024

per1234 self-assigned this Jun 13, 2024

per1234 merged commit abab3f2 into arduino:main Jun 13, 2024
37 checks passed

per1234 deleted the update-infra branch June 13, 2024 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync infrastructure assets from upstream "templates" #291

Sync infrastructure assets from upstream "templates" #291

per1234 commented Jun 13, 2024

Sync infrastructure assets from upstream "templates" #291

Sync infrastructure assets from upstream "templates" #291

Conversation

per1234 commented Jun 13, 2024