-
Notifications
You must be signed in to change notification settings - Fork 24
Orca training data
There are many recordings of killer whales available, but relative to other marine mammal species, there is a paucity of labeled data. For example, many toothed whale (Odontocete) species are included in the Mobysound archive, but not yet southern resident killer whales (as of July, 2020).
This page documents the growing array of labeled data specific to killer whale ecotypes, with a primary focus on Southern Resident Killer Whales, and a secondary focus on other ecotypes of the Northeast Pacific Ocean. Open data sources, including those provided by Orcasound member organizations, are listed first to promote collaboration. Closed data sources are listed in the hope that they become available to the open-source and open-data communities in the future, or are otherwise valuable as reference points.
Note: these data are for training models. For test data, please refer to the orca test data wiki page.
This section contains Orcasound data sets aimed at training machine learning models to detect and/or classify the signals of killer whales. The primary focus is on binary classification of any Southern Resident Killer Whales SRKW calls (yes/no for any call type), but labels may also indicate SRKW call type, whistles, or clicks, as well as associations with pods (J, K, and/or L), matrilines, and -- in rare cases -- isolated individuals. There are also some resources related to Bigg's or transient killer whales.
NOTE: To access these data you cannot use a browser. Instead note the URL and use the AWS Command Line Interface in a terminal window to access the public files. See the Data access via AWS CLI page to learn more about the AWS Command Line Interface. Many of these data are aggregated within the Orcasound "Acoustic Sandbox" (a public S3 bucket).
-
SRKW training data (in open-access Orcasound archives)
-
Labeled SRKW data
- Pod.Cast annotations
- OrcaAL (Orca Active Learning) annotations
- Labeling in process via orcaal.ai4orcas.net
- Un/labeled data and models within the
orcagsoc
Orcasound S3 bucket - Wiki synopsis page for OrcaAL in prep...
-
Unlabeled continuous recordings
- Orcasound bioacoustic bouts
- Sept 27, 2017 (24 hrs of continuous 5-min WAV files, with 4 hours containing SRKW signals)
- 25-27 October, 1997: historic recordings of SRKWs from a 1997 event in Dyes Inlet. The recording was provided courtesy of Dr. John Ford from his archives, and was made some 25 years ago by staff of the Center for Whale Research (CWR) -- an organizational member of Orcasound. These raw audio data are being shared with permission from the CWR Research Director, Dr. Michael Weiss. WAV format via AWS S3 or Quilt.
-
Southern Resident Killer Whale (SRKW) call catalog (John Ford, 1989; supplemented by Rich Osborne)
- Orcasound archive w/o human narration
- Github repository archive FLAC, ogg, mp3 formats
-
SRKW echolocation clicks in Orcasound bouts
- Blog post highlighting part of a bout with many clicks
- You can search the column for "echolocation priority" in the Google sheet of Orcasound labeling candidates
- 192kHz recording of SRKW clicks (Lime Kiln State Park; 2007, when LK was an Orcasound node collecting data under CC license.)
-
-
Bigg's (transient) killer whale training data:
- Un-labeled
- Labeled
- 2081 Dec 02: Bigg's & humpback nighttime recording with labels; ~1hr lossy recording with labels that may need to be simplified/standardized for training of a binary classifier...
- 2018 Dec 07: blog post with metadata and link to labeled raw data in .wav format; raw data with labels -- ~1 hour mp3 format, labeled in Audacity by Scott Veirs, labels may need to be simplified/standardized for training of a binary classifier...
- Bigg's call catalog
- John Ford contribution to Orcasound open-access data project of transient call types (T1,3,7,8) (Recorded by F. Thomsen on August 25 1996 near Numas I., Queen Charlotte Strait with many calls from T014, T015)
-
Prospects for additional Orcasound training (& testing) data:
- Pod.Cast candidates for additional rounds of annotation
- Current listener log
- General KW ecotypes
- Watkins Marine Mammal Library (WHOI), global killer whale ecotypes (1960-1993)
- Labeled data from the Watkins Marine Mammal Library's killer whale tapes (see Podcast round 1)
- West Coast Transient or Bigg's ecotype
- U.S. Navy recording of transients in Dabob Bay, 2005
- ~42 minutes of vocalization, echolocation, percussives
- AIFF and MP3 raw data and preliminary labels
-
Mobysound odontocete annotated data set with clicks of different categories labeled, and some "whistles" (pulsed calls?)
- Unger metadata Word document (via Google doc that allows public comments)
- label categories: all click types; burst pulse; buzz; click trains
- annotation by Sara Heimlich and/or Shaari Unger... but Triton was used for "whistles" and it's unclear whether click annotation was manual or automated.
- U.S. Navy recording of transients in Dabob Bay, 2005
- SRKWs
- Orca Behavior Institute (Monika Wieland), historic data from cabled Lime Kiln State Park
- NOAA (Brad Hanson, Marla Holt, Candice Emmons): autonomous recorders on outer coast WA and DTAG deployments on SRKWs
- ONC (Kristen Kanes, Science open data set in 2020?), cabled arrays on outer BC shelf (Barkley Canyon; and Georgia Strait? Early versions were not specific to ecotype?)
- DFO (James Pilkington), mostly autonomous recorders on outer coast BC (mostly clips? may be specific to ecotype)
- SMRU/TWM (Jason Wood), some labeled by Alex Harris (30,000 general KWs; 30,000 non-KWs)
- J17s (only matriline present), recording by All Aboard Sailing
- JASCO (David Hannay? Ruth Joy?), 5 second clips
- NRKWs
- OrcaLab (Paul Spong, Helena Symonds), cabled near-shore hydrophones in Johnstone Strait, B.C.
- Orchive (data archive by Steve Ness at UVic)
- OrcaSPOT (Bergler et al. ML effort published in 2019)
- OrcaSPOT repo (Python code)
- OrcaSPOT publication (2019)
- Rachel Cheng mentioned referencing Steven Ness thesis for description of the Orchive data set. Christian Bergler used ~424 labeled pulsed calls from Paul & Helena to compare the unsupervised clustering method and supervised classification (2019). They were told that those labeled calls were used at the orcalab to train volunteers to recognize NRKW signals. More details about the call types can be found in the paper. Unfortunately, those labeled calls only cover matrilines frequently sighted in Johnstone Strait and are not a complete snapshot of the vocal repertoire of NRKW. (We wondered but did not yet clarify which clans and matrilines are represented in those labels.)
- [Cetacean Research Technology recordings of Springer] (https://www.cetaceanresearch.com/sounds/springer-sounds.html) (A-pod juvenile A73, aka Springer) by Joe Olson between Seattle and Vashon Island near the ferry lanes. 19 January 2002 recordings when she was isolated from her natal pod in Puget Sound (believed to be the first recordings ever of an individual wild killer whale); exactly after her successful return to the NRKW community on 14 July 2002, Joe again recorded A73 -- this time on 14 July 2007 in Johnstone Strait, accompanied by other members of her extended family belonging to the A8, A11, and A12 subpods. (2023 archive of CRT page and audio files)
- David Bain recordings of A73 in Puget Sound during winter/spring, 2002?
- Pacific Wild unlabeled archive (Soundcloud), cabled near-shore hydrophones in central B.C., near Bella Bella.
- OrcaLab (Paul Spong, Helena Symonds), cabled near-shore hydrophones in Johnstone Strait, B.C.
- Alaska residents
- OrcaCNN (Dan Olsen), many recordings from autonomous and boat-based hydrophone recording systems
- Bigg's (transients)
- No labeled data (to our knowledge)
- Raw data sources:
- Alaskan transients (via Dan Olsen and Hannah Myers)
- Many recordings of AT1s (only 7 individuals left; unique sounding calls relative to other transients)
- Gulf of Alaska transients (need to be digitized)
- Alaskan transients (via Dan Olsen and Hannah Myers)
- California recordings (ecotype uncertain unless in a sub-section)
- MARS mooring recordings (Monterey Bay, CA)
- 2/7/2019: 15-sec clip w/overlapping excitement calls and possibly a separate call from a single animal
- MARS mooring recordings (Monterey Bay, CA)
- Antarctic ecotypes
- McMurdo Station
- 1/25/2018: "orcas 1km away"
- direct link to raw mp3 recording
- 2/15/2019: 100-minute Youtube recording of "killer whales" -- comments indicate pod was ~1-2 km away, likely foraging for Antarctic toothfish.
- McMurdo Station