This template repository helps make new Python projects easier to set up and more
uniform. It contains a lot of infrastructure surrounding a minimal Python package named
cheshire
(the cat who isn't entirely there...). This template is mostly a lightly
modified copy of Catalyst Cooperative's
cheshire but with alterations
for private work and alternative tools.
The goal of this template is to provide a uniform starting point for Python projects, with reasonable configurations for a suite of common tools. It is by no means comprehensive but generally errs on including a kind of tool rather excluding it. In other words, it includes a lot of things that are not necessary and likely not worth getting to work for a basic Python project.
Its configurations of testing and GitHub Actions for continuous integration (CI) support accessing PUDL tables stored as parquets on Google Cloud Storage which is the preferred way to access PUDL data when it needs to be part of CI. The CI side of this should 'just work' for any repository based on this template in rmi-electricity. Additional setup is required for getting it to work locally.
Table of Contents
Please read this whole getting started section before beginning.
- Choose a name for the new package that you are creating.
- The name of the repository should be the same as the name of the new Python package
you are going to create. e.g. a repository at
rmi-electricity/cheshire
should be used to define a package namedcheshire
. - Click the green
Use this template
to create a new Python project repo. See these instructions for using a template. - Create a release with a version tag if there isn't one already. This is required because various tools use it to set the version dynamically. See managing releases for more information.
- Clone the new repository to your development machine.
- Create the
cheshire
conda environment by runningconda env create -f environment.yml
in the top level of the repository. - Activate the new conda environment with
conda activate cheshire
. - If you intend to use PUDL data in your project, follow these
setup instructions. If not, delete these
three things (including the decorators above the latter two beginning with
@
):src/cheshire/dummy_pudl.py
,- the
test_use_a_table_from_pudl
test fromtests/dummy_unit_test.py
- Run
pre-commit install
in the newly cloned repository to install the pre-commit hooks defined in.pre-commit-config.yaml
. - Run
tox
from the top level of the repository to verify that everything is working correctly.
Once your forked version of the cheshire
package is working, you can change the
package and distribution names in your new repo to reflect the name of your package.
The package name is determined by the name of the directory under src/
which
contains the source code, and is the name you'll use to import the package for use in
a program, script, or notebook. E.g.:
import cheshire
The distribution name is the name that is used to install the software using a
program like pip
or conda
. We are using the rmi
namespace for the
packages that we publish, so the dispatch
package would have the distribution
name rmi.dispatch
. The distribution name is determined by the name
argument
under [project]
in pyproject.toml
. See PEP 423 for more on Python package
naming conventions.
The package and distribution names are used throughout the files in the template repository, and they all need to be replaced with the name of your new package.
- Rename the
src/cheshire
directory to reflect the new package name. - Search for
cheshire
and replace it as appropriate everywhere. Sometimes this will be with a distribution name likermi.cheshire
and sometimes this will be the importable package namecheshire
. You can usegrep -r
to search recursively through all of the files for the wordcheshire
at the command line, or use the search-and-replace functionality of your IDE / text editor. (Global search in PyCharm is command+shift+f)
Now that everything is renamed, make sure all the renaming worked properly by running
tox
from the top level of the repository to verify that everything is working
correctly. If it passes, you can commit your new skeleton package and get to work!
Warning
Unless you have relatively complete tests of your package, you will want to disable
.github/workflows/bot-auto-merge.yml
by either commenting out its contents or
deleting the file. If you do this, do the same with .github/dependabot.yml
.
If you leave these GitHub Actions in place with insufficient tests, GitHub might break your package by upgrading dependencies to version that are not compatible with your package.
- Dummy code for a skeleton python package with the following structure:
- The
src
directory contains the code that will be packaged and deployed on the user system. That code is in a directory with the same name as the package. - A simple python module (
dummy.py
), and a separate module providing a command line interface to that module (cli.py
) are included as examples. - A module (
dummy_pudl.py
) that includes an example of how to access PUDL data. - Any files in the
src/package_data/
directory will also be packaged and deployed.
- The
- Instructions for
pip
on how to install the package and configurations for a number of tools inpyproject.toml
including the following:- Package dependencies, including three sets of "extras" -- additional optional
package dependencies that can be installed in special circumstances:
dev
,doc`
, andtests
. - The CLI deployed using a
console_script
entrypoint. setuptools_scm
to obtain the package's version directly fromgit
tags.- What files (beyond the code in
src/
are included in or excluded from the package on the user's system. - Configurations for
ruff
,doc8
, andrstcheck
described in the Code Formatting and Linters section below.
- Package dependencies, including three sets of "extras" -- additional optional
package dependencies that can be installed in special circumstances:
- A skeleton pytest testing setup is included in the
tests/
directory. - Session-wide test fixtures, additional command line options, and other pytest
configuration can be added to
tests/conftest.py
- Exactly what pytest commands are run during continuous integration is controlled by Tox.
- We define several different test environments for use with Tox in
tox.ini
- Tox is used to run pytest in an isolated Python virtual environment.
- We also use Tox to coordinate running the code linters and building the documentation.
- The default Tox environment is named
ci
and it will run the linters, build the documentation, run all the tests, and generate test coverage statistics.
- We use Tox and the pytest coverage plugin to measure and record what percentage of our codebase is being tested, and to identify which modules, functions, and individual lines of code are not being exercised by the tests.
- When you run
tox
a summary of the test coverage will be printed at the end of the tests (assuming they succeed).
See GitHub Actions for additional tools that track coverage statistics.
- A variety of sanity checks are defined as git pre-commit hooks -- they run any time you try to make a commit, to catch common issues before they are saved. Many of these hooks are taken from the excellent pre-commit project.
- The hooks are configured in
.pre-commit-config.yaml
, see Code Formatting and Linters for details. - For them to run automatically when you try to make a commit, you must install the
pre-commit hooks in your cloned repository first. This only has to be done once by
running
pre-commit install
in your local repo. - These checks are run as part of our GitHub automations, which will fail if the pre-commit hooks fail.
Most git GUI tools work with pre-commit but don't work that well. The terminal based
git
is usually the safer choice. See
notes on git for
for recommendations and instructions.
To avoid the tedium of meticulously formatting all the code ourselves, and to ensure a
standard style of formatting and syntactical idioms across the codebase, we use several
automatic code formatters, which run as pre-commit hooks. The following formatters are
included in the template .pre-commit-config.yaml
:
- Deterministic formatting with ruff (similar to black)
- Fix some of the issues found by ruff,
including to:
- Use only absolute import paths
- Standardize the sorting of imports
- Remove unnecessary f-strings
- Upgrade type hints for built-in types
- Upgrade Python syntax
- We also have a custom hook that clears Jupyter notebook outputs prior to committing.
To catch additional errors before commits are made, and to ensure uniform formatting across the codebase, we also use ruff as a linter, as well as other tools, to identify issues in code and documentation files. They don't change the files, but they will raise an error or warning when something doesn't look right so you can fix it.
- ruff is an extremely fast Python linter, written in Rust that replaces a number of other tools including:
- doc8 and rstcheck look for formatting issues in our docstrings
and the standalone ReStructuredText (RST) files under the
docs/
directory.
See for tests and linters some advice on how to avoid getting bogged down making the linter happy.
- We build our documentation using Sphinx.
- Standalone docs files are stored under the
docs/
directory, and the Sphinx configuration is there inconf.py
as well. - We use Sphinx AutoAPI to
convert the docstrings embedded in the python modules under
src/
into additional documentation automatically. - The top level documentation index simply includes this
README.rst
, theLICENSE.txt
andcode_of_conduct.rst
files are similarly referenced. The only standalone documentation file underdocs/
right now is therelease_notes.rst
. - Unless you're debugging something specific, the docs should always be built using
tox -e docs
as that will lint the source files usingdoc8
andrstcheck
, and wipe previously generated documentation to build everything from scratch. The docs are also rebuilt as part of the normal Tox run (equivalent totox -e ci
).
- We use the GitHub Pages service to host our documentation.
- When you open a PR or push to
dev
ormain
, the associated documentation is automatically built and stored in agh-pages
branch. - To make the documentation available, go to the repositories settings. Select
'Pages' under 'Code and automation', select 'Deploy from a branch' and then
select the
gh-pages
branch and then/(root)
, and click save. - The documentation should then be available at https://rmi-electricity.github.io/<repo-name>/.
We use GitHub's Dependabot
to automatically update the allowable versions of packages we depend on. This applies
to both the Python dependencies specified in pyproject.toml
and to the versions of
the GitHub Actions that we employ. The
dependabot behavior is configured in .github/dependabot.yml
. Unfortunately, it does
not check or update environment.yml
, so that must be done manually.
For Dependabot's PRs to automatically get merged, your repository must have access to
the correct organization secrets and the rmi-electricity auto-merge Bot
GitHub App.
Contact Alex Engel for help setting this up.
Under .github/workflows
are YAML files that configure the GitHub Actions associated with the repository. We use GitHub
Actions to:
- Run continuous integration using tox on several different versions of Python.
- Build and publish docs to GitHub Pages.
- Merge passing dependabot PRs.
- When the tests are run via the
tox-pytest
workflow in GitHub Actions, the test coverage data from thecoverage.info
output is uploaded to a service called Coveralls that saves historical data about our test coverage, and provides a nice visual representation of the data -- identifying which subpackages, modules, and individual lines of are being tested. For example, here are the results for the cheshire repo.