Welcome to the Marxan Cloud platform. We aim to bring to the planet the finest workflows for conservation planning.
This repository is a monorepo which includes all the microservices of the Marxan Cloud platform. Each microservice lives in a top-level folder.
Services are packaged as Docker images.
Microservices are set up to be run with or without Docker Compose for local development - see the sections below for more details.
The recommended setup for new developers is to run all the backend services (api and geoprocessing services, alongside their PostgreSQL and Redis databases) via Docker Compose, and the frontend app natively.
In CI, testing, staging and production environments, microservices are orchestrated via Kubernetes (see the relevant documentation).
Most of the commands listed in this README and referenced elsewhere in the repository are targeted at a GNU/Linux OS environment such as a recent Ubuntu, Arch or Debian system, whether running natively or in a VM or under Windows Subsystem for Linux 2 (WSL 2). They should also work identically on MacOS, while they may need some adaptation to run on Windows systems.
In a nutshell, the Marxan solution is composed by the following components:
- A frontend application accessible through the browser - the
app
- A public, backend API - the
api
- A geoprocessing-focused service used by the
api
- thegeoprocessing api/application
- An HTML-to-PDF/HTML-to-PNG service - the
webshot
service
Besides these 4, there are other components that may be used in one-off
situations, like seeding source data (see /data
), testing
(/e2e-product-testing
) and others.
See ARCHITECTURE_infrastructure.md for details.
For development environments, a separate Sparkpost account than what is used for staging/production should be used. Unless the transactional email components of the platform are being actively worked on (email verification on signup, email confirmation for password changes, email flow for resetting forgotten passwords, etc.), there will be no need to set up email templates within the Sparkpost account, and only a Sparkpost API key will be needed (see documentation on environment variables for details on this).
Before attempting to use the following steps, be sure to:
- Install Docker (19.03+):
- Install Docker Compose
- Create an
.env
at the root of the repository, defining all the required environment variables. In most cases, for variables other than secrets, the defaults inenv.default
may just work - your mileage may vary.
For development environments, a .env
file can be generated with
default/generated values suitable to run a development instance, via the
following command:
make .env
This will only generate a file if no .env
is present in the root of the
repository.
The PostgreSQL credentials set via environment variables are used to create a database user when the PostgreSQL container is started for the first time. PostgreSQL data is persisted via a Docker volume.
Run make start-api
to start all the 4 services needed to run Marxan, as well
as the required database services, in containers via Docker Compose.
The docker build process may take a few minutes, depending on your hardware,
software and internet connection. Once completed, the applications will start,
and you should be able to access the Marxan site on localhost
, on the port
specified as APP_SERVICE_PORT
.
To enable the Node inspector while running the MarxanCloud API services in
containers, use make debug-api
instead. Example configuration
files for debugger setup in popular editors are provided in the
docs/developers/editors/
documentation folder.
When enabled, the Node inspector will start listening on the default port
9229/tcp
both in the API and geoprocessing containers, and by default Docker
will forward this port to port 9230/tcp
on the host for the API service, and
to port 9240/tcp
for the geoprocessing service, where the inspector can be
reached by clients.
For security reasons (in case the host is, for example, a VM with a public IP address and without a firewall in front, for whatever reason), the inspector port will only be open on the loopback interface.
Make sure you have installed and configured all the dependencies locally. PostgreSQL (with PostGIS) and Redis need to be up and running.
When running the API and Geoprocessing services without relying on Docker
Compose for container orchestration, be sure to review and set the correct
environment variables before executing the application.
The env.default
file and the docker-compose
configuration files may give
you some example values that work for docker-based executions, and that may
be useful when implementing your native execution configuration.
The included Makefile has some useful build targets (commands) specifically
targeted at native execution (prefixed with native-
) that you'll find helpful.
Refer to the Makefile inline documentation for more details.
If you'd like to run the application directly using Yarn, you can find a
package.json
inside the /app
folder with dependencies and commands for both
applications. After installing the nodejs dependencies, this is how you can
start either application:
// Run the API
yarn start
// Run the geoprocessing service
yarn start geoprocessing
The Frontend application can be found in /app
. Be sure to populate the
spp/.env
file (note: this is an .env
file distinct from the top-level .env
file which is used to configure backend microservices and data processing
pipelines) according to the app documentation, as well as
install the necessary nodejs packages.
To start the application, run:
yarn dev
The frontend app will then be available on http://localhost:3000 (or at the URL shown when the app starts, if a different port has been configured).
The webshot service can be found in the /webshot
folder. After installing
the necessary nodejs packages, you can start it by running:
yarn start:dev
Due to upstream packaging of the Chrome browser used by the Webshot service, it
may not be possible to run the webshot service in aarch64
environments (such
as MacOS on Apple silicon).
make native-seed-api-with-test-data
Running the whole test suite requires running 3 commands, each focused on a specific type of test:
To run the unit tests for both the API and the Geoprocessing app:
yarn run test
To run the E2E tests for the API:
yarn run api:test:e2e
To run the E2E tests for the Geoprocessing app:
yarn run geoprocessing:test:e2e
Note that E2E tests may trigger cross-application requests, so:
- When running E2E tests for the API, you must have the Geoprocessing application running in the background.
- When running E2E tests for the Geoprocessing application, you must have the API running in the background.
Running tests require previously loading the test seed data, and may modify data in the database - do not run tests using a database whose data you don't want to lose.
All fresh installations of Marxan (be it locally for development or in a cloud provider for production) start off with empty databases, that need to be populated with seed data before the Marxan platform is fully functional. The seed data you'll want to import will depend on the goal of the installation you are currently setting up.
Please make sure to wait for all of the backend services (api, geoprocessing and webshot) to fully start, as database migrations will be run while the services are started: attempting to import seed data before migrations have run fully will result in errors.
There are three types of seed data available with the application:
- Geographic data: platform-wide spatial data for admin boundaries (GADM), protected areas (WDPA) and conservation features, such as the World Terrestrial Ecosystems database. These datasets should be available in every Marxan instance.
- User data: user accounts, intended only for development instances and for e2e/unit tests; these must not be imported in production-grade environments.
- Test data: intended only for environments where development or e2e/unit tests execution takes place, and must not be imported in production-grade environments.
Please review the following sections carefully to determine which best fits your needs for each deployment
User data is necessary for all types of Marxan installations, but different user data import processes will best fit different use cases.
There are two ways to create user accounts:
Using the nodejs CLI
cd api
yarn run console create:user EMAIL_ADDRESS PASSWORD [-f, --firstname <first name>] [-l, --lastname <last name>] [-d, --displayname <display name>]
Using Make
// For Marxan running on Docker
make seed-api-init-data
// For Marxan running natively
make native-seed-api-init-data
The first option will allow you to create a custom user, and is targeted at
environments where user accounts are meaningful - for example, production. To
execute this on a cloud hosted version of Marxan, you should run the command
above on the VM instance/docker container running the api
application.
In contrast, the second approach will batch-create several users with insecure passwords and generic details, and it's only suited for development, testing or otherwise ephemeral environments.
Importing the initial geographic data executes a long-running data ETL pipeline
that imports large amounts of data from publicly available datasets onto
Marxan's PostgreSQL server - using both api
and geoprocessing
databases.
First, either set up a new Marxan instance from scratch, or reset an existing
one to a clean-slate status (make clean-slate && make start-api
) - this allows
to import spatial data into a clean database, avoiding that any user-uploaded
data may end up in seed data.
Once a clean Marxan instance is running, the easiest way to execute the spatial
data import process is using the following make
task, which runs a dockerized
version of the tool:
make seed-geodb-data
Note this process can complete successfully and exit with code 0, but have errors in the output logs. This is expected, and said log errors can be ignored.
The actual implementation can be found in the /data
folder
This will populate the metadata DB and will trigger the geoprocessing ETL pipelines to seed the geoprocessing DB with the full data that is needed for production-grade instances of Marxan.
Please note that this full DB set up will require at least 16GB of RAM and 40GB of disk space in order to carry out some of these tasks (GADM and WDPA data import pipelines). Also, the number of CPU cores will impact the time needed to seed a new instance with the complete GADM and WDPA datasets, which will be 1h+ on ideal hardware.
To execute this on a cloud hosted version of Marxan, you have a couple of options:
- Run the import process locally on local running PostgreSQL servers, then
export the resulting
.sql
locally and import it remotely. - Run the import process locally, while having it connect directly to the remote
api
andgeoprocessing
databases usingkubectl
to set up port forwarding. You may need to modify the container's network mode tohost
for this to work.
While geographic data is technically necessary on all Marxan environments, there is a faster alternative to import equivalent data on development/test environments, which is discussed in the next section.
Test data includes both user data and (a small subset of) the geographical data described above, as well as extra data necessary to run certain types of automated tests. This data is meant for development/testing environments only, and should not be imported in production environments.
// For Marxan running on Docker
make seed-dbs
// For Marxan running natively
make native-seed-api-init-data
These commands will:
- Import generic user data (equivalent to
seed-api-init-data
/native-seed-api-init-data
described above) - Import a precomputed subset of the geographical data
- Create sample/test Marxan resources, like organizations, scenarios, etc.
The main Makefile
provides a way to reset db instances from scratch. This can
be useful to do regularly, to avoid keeping obsolete data in the local
development instance.
make clean-slate
The main Makefile
provides a set of commands to create new db dumps from
upstream data sources, upload these dumps to an Azure storage bucket, and
populating both dbs from these dumps.
Populating clean dbs this way will be much quicker than triggering the full ETL pipelines to import geographic data.
When uploading new dumps of seed data to an Azure storage container, or when
downloading pre-prepared data seeds from it, the following environment variables
must be defined in the root .env
file:
DATA_SEEDS_AZURE_STORAGE_ACCOUNT_NAME=
DATA_SEEDS_AZURE_STORAGE_CONTAINER_NAME=
This will allow to run the az storage blob
commands in the relevant Make
recipes with suitable authorization.
Users should have suitable access to the storage container configured.
For data uploads, they will need to be logged into an Azure account that is allowed to write to the container:
- install the Azure CLI tool
(
az
) - get an Azure user set up, with suitable permissions to write to the relevant Azure storage account and container
- log in to this Azure account via the
az
CLI tool (https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli)
For data downloads, the container itself needs to be created with "public blobs" settings so that individual blobs can be fetched via non-authenticated HTTP requests.
To run the geoprocessing ETL pipelines (such as when using the Seed data, option 1 above) to dump data from a previously seeded instance and upload the processed data to an Azure bucket:
make generate-content-dumps && make upload-dump-data
Other developers can then benefit from these pre-prepared data seeds when
populating new development instances after their initial setup, by running the
following command on a clean Marxan instance (that is, from empty databases, and
after letting all the migrations run for both api
and geoprocessing
services):
make restore-dumps
This step is only needed when developing Python notebooks for Marxan.
Run make notebooks
to start the jupyterlab service.
We use a lightweight git flow workflow. develop
, main
, feature/bug fix
branches, release branches (release/vX.Y.Z-etc
).
Please use per component+task feature branches: <feature type>/<component>/NNNN-brief-description
. For example:
feature/api/12345-helm-setup
.
PRs should be rebased on develop
.
As feature types:
feature
bugfix
(regular bug fix)hotfix
(urgent bug fixes fast-tracked tomain
)
Infrastructure code and documentation can be found under /infrastructure
CI/CD is handled with
GitHub Actions. More details can be found
by reviewing the actual content of the .github/workflows
folder but, in a nutshell,
GitHub Action will automatically run tests on code pushed as part of a Pull Request.
For code merged to key branches (currently main
and develop
), once tests run
successfully, Docker images are built and pushed to a
private Azure Container Registry.
The GitHub Actions workflows currently configured requires a few secrets to be set on GitHub in order to work properly:
AZURE_CLIENT_ID
: Obtain from Terraform'sazure_client_id
outputAZURE_TENANT_ID
: Obtain from Terraform'sazure_tenant_id
outputAZURE_SUBSCRIPTION_ID
: Obtain from Terraform'sazure_subscription_id
outputREGISTRY_LOGIN_SERVER
: Obtain from Terraform'sazurerm_container_registry_login_server
outputREGISTRY_USERNAME
: Obtain from Terraform'sazure_client_id
outputREGISTRY_PASSWORD
: Obtain from Terraform'sazuread_application_password
output
Some of these values are obtained from Terraform output values, which will be documented in more detail in the Infrastructure docs.
Please use the Marxan Cloud issue tracker to report bugs.
(C) Copyright 2020-2023 Vizzuality.
This program is free software: you can redistribute it and/or modify it under the terms of the MIT License as included in this repository.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the MIT License for more details.
You should have received a copy of the MIT License along with this program. If not, see https://spdx.org/licenses/MIT.html.