This repository contains a scheduled workflow, configuration and code to synchronise iNaturalist observations to the CAMS Weed App (running on the ArcGIS Online platform).
The CAMS Weed App enables ongoing monitoring and control of weeds, showing different colours and shapes for the current status of the weed patch. The status is reset periodically to Purple - please check
and the status is updated as each patch is checked:
The status of each observation can be updated by adding the observation to the Weed Management Aotearoa NZ iNaturalist project and setting the observation fields. See the user guide for instructions.
The code is intended to be scheduled to run regularly, e.g. hourly, picking up new and updated observations from iNaturalist. Note that the updates to CAMS are idempotent, so can be rerun without creating new CAMS records. The synchronisation will only pick up new or updated observations containing updates that we are interested in.
We keep this log of all observations and updates that have been synchronised.
The iNaturalist observations are selected based on taxon and place (e.g. old man's beard in Wellington). Each matching iNaturalist observation creates a new Feature in CAMS, with a parent WeedLocation
record and a child WeedVisit
record. Updates to the iNaturalist observation may create additional WeedVisit
records, dependent on what caused the update, for example:
sequenceDiagram
Actor User
participant iNat as iNaturalist Observation
participant CAMS as CAMS Feature
User->>iNat: New observation
iNat->>CAMS: New feature (WeedLocation and WeedVisit)
Note right of iNat: When synchroniser runs
Note right of User: Sometime later
User->>iNat: Observation identification added
Note right of iNat: No update to CAMS needed
Note right of User: Sometime later
User->>iNat: Treated ? = Yes (on same observation)
iNat->>CAMS: New WeedVisit record added to feature
Note right of iNat: When synchroniser runs
Note right of User: Sometime later
User->>iNat: Status update (on same observation)
iNat->>CAMS: New WeedVisit record added to feature
Note right of iNat: When synchroniser runs
The time that the latest observation was updated is stored in a *_time_of_last_update.txt
file. When the synchronisation is rerun, it checks for observations which have been updated since this timestamp (and then updates the file with the new last update timestamp).
The synchroniser is run regularly (currently hourly) by the synchronise-inat-to-cams workflow.
It can be triggered manually by clicking the Run workflow
button on that page (assuming you are logged in and have permission to do so).
The schedule is configured in the workflow definition.
Under on:
> schedule:
the cron:
setting defines a cron expression.
For example,
- cron: '42 * * * *'
specifies that the workflow will be run at 42 minutes past each hour.
Note that the GitHub cron schedule uses the UTC timezone.
Credentials, such as the username and password for logging on, are encrypted and stored in GitHub Secrets.
These credentials can only be read by GitHub Actions and are masked in the log files.
Currently all environments are within the same ArcGIS account. The code requires two ArcGIS feature layers within this account:
An expendable feature layer for development and testing of new code. Prior to running the Behaviour Driven Development (BDD) tests, a check is made that the feature layer is intended for testing (see environment.py). This ensures that we are not creating and deleting test data in production.
The main feature layer containing CAMS weed data targeted by the sychroniser.
The environment is configured in the relevant workflow file.
We do not have a test environment for iNaturalist, so do not perform any automated testing against iNaturalist. The operations we currently perform do not need an iNaturalist account, so are performed anonymously.
If the workflow fails, a notification will be sent to the person who last updated the cron schedule or, if manually triggered, the person that triggered the workflow. See notifications for workflow runs for details.
Detailed logs can be viewed by clicking on the workflow run. See Using workflow run logs if you need help with this.
At one stage, iNaturalist had an issue reading changes which hung on the get request for 6 hours until the GitHub job timed out. To avoid this happening again we have implemented:
An overall timeout after 120 minutes configured in the workflow
timeout-minutes: 120
This should allow for large synchronisation jobs to be performed, while also reducing the overall minutes used when reads fail.
An additional timeout of 120 seconds is applied to the iNaturalist read in case this hangs.
We have sometimes had intermittent issues connecting to iNaturalist or ArcGIS. To increase the chances of success, we have added retry logic to iNaturalist and ArcGIS interface methods. These are currently set to retry 3 times with a 5 second wait between retries.
The workflow is currently running under the GitHub free account, limited to 2,000 minutes/month. Since the workflow runs hourly, this equates to about 2.7 minutes per workflow run.
Most of the workflow time is spent installing cached dependencies. While our immediate dependencies currently use fixed versions, some of the transitive dependencies use version ranges, which can cause this time to escalate. It's worth keeping a periodic watch on the time taken taking by the workflows to ensure they normally complete within 2 minutes.
The synchronisation workflow updates several files which are subsequently committed and pushed back to GitHub. These files are:
- a
*_time_of_last_update.txt
file for each sync configuration sync_history.md
containing details of all observations synchronised
Configuration files allow the following to be easily modified:
The sync_configuration file determines which observations are synchronised from iNaturalist to CAMS.
An example definition is:
{
"Old Man's Beard Free Wellington": {
"file_prefix": "ombfw",
"taxon_ids": ["160697"],
"place_ids": ["6868"]
}
}
where:
-
"Old Man's Beard Free Wellington"
is the project name, which is only used for logging purposes -
"file_prefix"
is used as the file prefix for the last update timestamp file. For example, the above definition causes the ombfw_time_of_last_update.txt to contain the time of the last record updated for this definition. -
"taxon_ids"
contains a comma delimited list of iNaturalist taxa to be included. The iNaturalist taxa page includes a search bar to allow you to find the relevant taxon id (after selecting the species, click on theAbout
tab, scroll to the bottom and copy the 6 digit code afteriNaturalist
:).Note that the taxon id can be of a taxon higher up in the taxon lineage. For example, we use the generic Section Elkea to cover all Banana Passionfruit. All Banana Passionfruit, including Passiflora Tarminia, Passiflora Tripartia and hybrids, fall under this section. The average volunteer won't be able to tell the difference and all of them need tackling!
The taxon_ids must also be defined in the
taxon_mapping
file (see below). -
"place_ids"
contains a comma delimited list of iNaturalist places to be included. The iNaturalist places page includes a search bar to allow you to find the relevant place (or you can create a new place if needed). Clicking onEmbed Place Widget
will show the place id in the URL.
Observations that contain one of the taxon_ids
within one of the place_ids
will be synchronised. (Note that observations must have a location and date observed set as well as geoprivacy being set to Open for the observation to be synchronised.)
If you add a taxon or place to an existing entry, prior records for the new taxon or place will not automatically be synchronised. To force them to be synchronised, you must first delete the
file_prefix_time_of_last_update.txt
file (where file_prefix
is replaced by the file prefix for the entry). Upon rerunning the synchronisation, all records will be resynchronised. Since the
CAMS updates are idempotent, only the new entries for taxon or place will be added and existing entries won't be modified.
NOTE: any modifications to existing entries made through the CAMS app may be overwritten. It may be worth checking and/or backing up the data first in case of any issues.
The taxon_mapping file contains a mapping from the iNaturalist taxon to the CAMS taxon. Note that all taxon_ids
listed in the sync_configuration
file must have a taxon mapping entry.
An example definition is:
{
"160697": "OldMansBeard",
"285911": "CathedralBells",
"879226": "BananaPassionfruit"
}
where:
"160697"
is the iNaturalist taxon id for Old Man's Beard"OldMansBeard"
is the CAMS taxon name for Old Man's Beard. The list of taxon names can be viewed on ArcGIS by opening the feature layer, clicking onData
>Fields
>Species Menu Dropdown
then clicking theEdit
button next toList of Values (Domain)
. The taxon name is theCode
value.
The cams_schema file contains the expected schema of the CAMS feature layer. This is used to:
- Validate the schema at startup to ensure that the CAMS schema has not deviated from the expected schema. If the schema has deviated, the code will abort with an error message, allowing the code (or schema) to be corrected.
- Map names of values to the coded Value.
An example definition is:
{
"WeedLocations": {
"Date First Observed": {
"name": "DateDiscovered",
"type": "Date"
},
"DataSource": {
"name": "SiteSource",
"type": "String",
"length": 39,
"values": {
"iNaturalist": "iNaturalist_v1"
}
},
...
}
}
where:
"WeedLocations"
is the name of the layer or table (note this is eitherWeedLocations
orWeed_Visits
with the current schema)"name"
is the name of the field"type"
is the type of the field (with theesriFieldType
prefix removed)- for String fields,
"length"
contains the maximum length of the String values, allowing the String to be truncated to fit - for fields containing coded values, the
"values"
block maps the internal names used by our code to the CAMS code (The CAMS codes can be viewed on ArcGIS by opening the feature layer, clicking onData
>Fields
>relevant field
then clicking theEdit
button next toList of Values (Domain)
. Copy the value from theCode
column.)
In addition to these configuration files, the following environment variables are needed to run the code locally (or must be set up in GitHub Secrets to run the GitHub Actions workflow):
ARCGIS_URL
must be set to the base URL of the target ArcGIS organisation (e.g.'https://organisationname.maps.arcgis.com/'
)ARCGIS_USERNAME
must be set to the username to log on withARCGIS_PASSWORD
must be set to the password to log on withARCGIS_FEATURE_LAYER_ID
must be set to the item id of the feature layer to be updated
The code is written in Python 3.11.
The dependencies are frozen so that new transitive dependencies do not break the GitHub Actions workflows. To update the dependencies for GitHub Actions:
- create a fresh virtualenv locally
pip install -r requirements.txt
pip freeze > requirements_lock.txt
The arcgis
package only supports up to Python3.11 as of 2023-09-25 (version 2.2.0 requires Python >=3.9, <3.11).
Dependencies include:
- the awesome pyinaturalist client for the iNaturalist API to read the iNaturalist data.
- the ArcGIS REST API to write to the CAMS weed app.
For development, we have used the free PyCharm IDE.
The folder structure is:
.github
contains the GitHub workflows and dependabot file (for notification of security vulnerabilities and package updates in dependencies)config
contains configuration filesfeatures
contains the feature files including automated test scenariosinat_to_cams
contains the main code
This diagram shows the flow of the synchronisation from iNaturalist to ArcGIS CAMS.
When the synchroniser is invoked, it:
- Parses the sync_configuration file to determine the synchronisations to perform. A sync configuration contains the taxa and places to be synchronised. For each sync configuration:
- The
time of last update
is read - A request is made to iNaturalist for any new observations for the relevant taxa and places since the previous last update time
- For each new observation:
- The observation is read by iNaturalist_reader.
- This creates a complex data structure, which is flattened into an iNaturalist_observation.
- The translator translates the observation into a cams_feature. There is some complexity to the translation, for example:
- some weeds are mapped at a higher level of the taxonomy than an individual species. For example,
Banana Passionfruit
is mapped toSection Elkea
which contains a number of species and their hybrids. The translation works up the taxonomic tree until it finds a matching taxa. - the
visit date
andstatus
are calculated dependent on the latest of thedate_controlled
,date_of_status_update
,date_first_observed
fields. Thestatus
is translated to one of the CAMS colour status fields dependent on various fields. - dates and times are converted from UTC to local time
- some weeds are mapped at a higher level of the taxonomy than an individual species. For example,
- The
cams_feature
is written to the ArcGIS CAMS feature layer using the cams_writer. This uses a cams_reader to read the current record and check for differences before creating thefeature
and/orvisit record
if modified. Sometimes the changes in the iNaturalist observation are to fields that we aren't interested in and no changes need writing to CAMS.- String fields are truncated if they are longer than the target CAMS fields.
cams_writer
andcams_reader
delegate to cams_interface to interface with ArcGIS. This interface also checks that the fields in the CAMS feature layer and visits table are as expected (type, length etc)
- A summary of any changes are logged using the summary logger. This is configured in setup_logging.
- The updated
time of last update
is written to file.
- The
The project's features are described using Feature Files that are automated using Behave. Once the feature is well understood, the code to implement these features is then developed.
Explore our feature files.
The resultant reports are published as artifacts at the end of each Run Tests workflow run:
Unzipping the report file and opening the behave_reports.html
file shows the status of each scenario: