-
Notifications
You must be signed in to change notification settings - Fork 25
Administration
We welcome your help with administering the OrcaHello real-time inference system. If you'd like to get involved, please check out the on-boarding info for the Orcasound open source community and let us know you'd like to help maintain the system as a devops volunteer.
Once on-boarded, your first step will be to read the information in this wiki and discuss any questions with existing administrators. Once you have a handle on how the system works, we can grant appropriate access to cloud-based services so you can begin helping.
The OrcaHello project is based in Azure and most of the communication about the project occurs in the Orcasound Slack year-round and via Teams during Microsoft hackathons. Other aspects of Orcasound open source projects, including the management of the "swarm" of hydrophone nodes and the live-listening web app designed for community scientists, are deployed beyond Azure and are administered via general Orcasound communication tools (Orcasound Slack, Github, and Trello). Both OrcaHello and the Orcasound web app draw raw audio data from S3 buckets.
Here is a schematic of the OrcaHello system:
In a nutshell, some administration may be required for each pictured component of OrcaHello:
- Pulling raw audio data streamed from Orcasound hydrophones as quickly as possible (involves coordination with the broader Orcasound administrative community);
- Predicting when orca calls occur in the data stream (involves keeping model running, strategizing with data scientists and working with moderators to improve models)
- Maintaining databases and storage of model outputs and resources (including training data from other Orcasound projects like the Pod.Cast annotation tool)
- Maintaining the notification system(s)
An overall responsibility of administrators is to monitor the costs of the OrcaHello system. This was first done (2019-2022) to track usage of credits granted to the project by Microsoft's AI for Earth program via an Azure subscription sponsorship, first to Orca Conservancy, and later to the University of Washington's Detect2Protect initiative. In fall 2022, the AI for Earth program ended and all sponsorships expired on Oct 24, so now costs are monitored to ensure the Orcasound community can continue to fund OrcaHello through a pay-as-you-go subscription (covered by Beam Reach .
** Fall 2023 admin foci: quantify binary call classifier performance and fine-tune model(s); monitor costs as new nodes are added **
Nov. 2021 admin focus: stabilizing deployment
Prakruti posed a key question: ** "What Azure service shall we use in the long-run?"**
Michelle's initial suggestions of options (with pros / cons):
- Azure Container Instances (ACI)
- Pro: we're using it already
- Con: prone to failing frequently, no horizontal scalability (if/when we need it)
- Azure Kubernetes Service (AKS)
- Pro: horizontal scalability, declarative method allows for easy cluster migration/adding hydrophones, cheaper than ACI
- Con: need to manage cluster (K8s version upgrades), higher barrier to entry knowledge wise
- Virtual machines (VM)
- Pro: lowest bar to entry, cheapest option
- Con: managing VMs, no horizontal scalability
- Azure Container Apps
- Pro: no need to maintain infrastructure, horizontal scalability
- Con: preview service (just announced), most expensive option
In 2022, we transitioned to AKS in preparations for the fall/Oct annual hackathon.
- Hydrophones are broken
- AWS buckets are not picking up audio correctly
- Inference system is broken (AKS + Azure storage)
- Cosmos DB is broken
- Moderator portal is broken
- Notification system is broken (Azure Functions)
- SendGrid broken
-
Ensure the containers are running. (In Azure, view
container instances
-> look for something likelive-inference-system-aci-allhydrophones...
-> State should berunning
) -
Make sure Orcasound data is reaching the Azure blob (look in e.g.
livemlaudiospecstorage.blob.core.windows.net/audiowavs
) -
Check to see that new rows have recently been added to the CosmoDB database
View Azure CosmoDB databases ->
Example queries:
- SELECT * FROM c WHERE c.timestamp LIKE "%2021-10-02%"
- SELECT * FROM c WHERE c.comments LIKE "%transients%" <- matches all records with comment that includes the string "transients"
After the 2024 hackathon improvements, this section of the guide could distill Michelle's step-by-step instructions shared in this 2023 Hackathon issue discussion of adding nodes to the ML pipeline.
Moderators can be added/removed by an Orcasound Azure administrator from the SendGrid email list via Azure via this UI...
...however, to reduced SendGrid costs the preferred method in 2023 is to direct moderators to manage their subscription via the OrcaHello moderator listserv.
As of late 2022, the OrcaHello system sends emails to subscribers every time a moderator confirms a candidate contains at least one SRKW call. These emails are sent via SendGrid via a list maintained by Orcasound's administrator team.
One of the sendmail recipients is the orcahello-notifications listserv (Mailman-based email distribution listed via Orcasound's dreamhost.com account). The listserv is available at http://lists.orcasound.net/listinfo.cgi/orcahello-notification-orcasound.net where message history and subscriber lists are available along with un/subscription opportunities.
In the 2023 roadmap, raw and/or verified OrcaHello detections may be pushed to the Acartia.io data cooperative. The Acartia.io API will thereafter provide programmatic access to machine and human acoustic detections, along with visual and other observations in real time, and thereby enable development of additional notifications schemes -- like geographic filters or temporal grouping, could be implemented.
Guidance from AI for Earth:
You may now request GPU access for your Azure account using the following steps. As a courtesy, please
only request the SKU/region you need and not ask for all SKUs/regions. Also, to fairly distribute the
limited number of GPUs that AI for Earth has reserved for grantees, we are limiting each grantee access
to only 12 additional cores of each SKU/region.
1. Go to the Azure Portal: https://portal.azure.com
2. Select Help + support
3. Select New support request
4. Select “Issue type” as "Service and subscription limits (quotas)" from the drop-down list
5. Select the subscription ID (make sure it’s the Subscription that you sent to us for whitelisting)
6. Quota type = Compute/VM (cores/vCPUs) subscription limit increases
7. Change the Support Method as appropriate
8. Set the Request Details for “Resource Manager” deployment model (the GPU SKUs are not
deployable via the Classic deployment model)
a. Select your Severity level,
b. Select the Deployment model,
c. Select the Location,
d. Select SKU Family (multiple selections are possible),
e. You can see the current quota limit,
f. Fill the required new quota limit
g. Click on “Save and continue”
9. Verify contact info and click on " Next: Review + create " to generate a request.
Please contact [email protected] if you have any difficulties with requesting access.