ROS2 nodes and utilities for audio streams.
For ROS1, please see the ros1
branch.
Author(s): Marc-Antoine Maheux
The following subsections explain how to use the library on Ubuntu.
sudo apt-get install cmake build-essential gfortran texinfo libasound2-dev libpulse-dev libgfortran-*-dev
sudo pip install -r requirements.txt
or
sudo pip3 install -r requirements.txt
git submodule update --init --recursive
This node captures the sound from an ALSA or PulseAudio device and publishes it to a topic.
backend
(string): The backend to use (alsa
orpulse_audio
). The default value isalsa
.device
(string): The device to capture (ex:hw:CARD=1,DEV=0
ordefault
for ALSA, oralsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input
for PulseAudio). The default value isdefault
.format
(string): The audio format ( see audio_utils_msgs/AudioFrame). The default value issigned_16
.channel_count
(int): The device channel count. The default value is1
.sampling_frequency
(int): The device sampling frequency. The default value is16000
.frame_sample_count
(int): The number of samples in each frame. The default value is1024
.merge
(bool): Indicate to merge the channels or not. The default value isfalse
.gain
(double): The gain to apply. The default value is1.0
.latency_us
(int): The capture latency in microseconds. The default value is64000
.channel_map
(Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is[]
.queue_size
(int): The publisher queue size. The default value is1
.
audio_out
(audio_utils_msgs/AudioFrame) The captured sound.
This node captures the sound from a topic and plays it to an ALSA or PulseAudio device.
backend
(string): The backend to use (alsa
orpulse_audio
). The default value isalsa
.device
(string): The device to capture (ex:hw:CARD=1,DEV=0
ordefault
for ALSA, oralsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input
for PulseAudio). The default value isdefault
.format
(string): The audio format ( see audio_utils_msgs/AudioFrame). The default value issigned_16
.channel_count
(int): The device channel count. The default value is1
.sampling_frequency
(int): The device sampling frequency. The default value is16000
.frame_sample_count
(int): The number of samples in each frame. The default value is1024
.latency_us
(int): The capture latency in microseconds. The default value is64000
.channel_map
(Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is[]
.queue_size
(int): The publisher queue size. The default value is1
.
audio_in
(audio_utils_msgs/AudioFrame) The sound to play.
This node estimates the song tempo and detects if the beat is in the current frame.
sampling_frequency
(int): The device sampling frequency. The default value is44100
.frame_sample_count
(int): The number of samples in each analysed frame. It must be a multiple ofoss_fft_window_size
. The default value is128
.oss_fft_window_size
(int): The onset strength signal window size. It must be greater than or equal toframe_sample_count
. The default value is1024
.flux_hamming_size
(int): The flux hamming window size to calculate the onset strength signal. The default value is15
.oss_bpm_window_size
(int): The onset strength signal window size to calculate the BPM value. The default value is1024
.min_bpm
(double): The minimum valid BPM value. The default value is50
.max_bpm
(double): The maximum valid BPM value. The default value is180
.bpm_candidate_count
(int): The number of cross-correlations to perform to find the best BPM. The default value is10
.
audio_in
(audio_utils_msgs/AudioFrame) The sound to analyze. The channel count must be 1.
bpm
(std_msgs/Float32): The tempo in bpm (beats per minute) for each frame.beat
(std_msgs/Bool): Indicate if the beat is in the current frame.
This node performs voice activity detection with Silero VAD. The models folder contains the model trained by Silero VAD. The license of the model is MIT.
silence_to_voice_threshold
(double): The threshold to detect voice activity when silence was previously detected. The default value is0.5
.voice_to_silence_threshold
(double): The threshold to detect silence when voice activity was previously detected. It must be lower thansilence_to_voice_threshold
. The default value is0.4
.min_silence_duration_ms
(double): The minimum silence duration in ms. The default value is500
.
audio_in
(audio_utils_msgs/AudioFrame) The sound to analyze. The channel count must be 1. The samply frequency must be 16000 Hz. The frame sample count must be a multiple of 512.
voice_activity
(audio_utils_msgs/VoiceActivity) The voice activity detection result.
This node converts the format of an audio topic.
input_format
(string): The input audio format ( see audio_utils_msgs/AudioFrame).output_format
(string): The output audio format ( see audio_utils_msgs/AudioFrame).
audio_in
(audio_utils_msgs/AudioFrame) The sound topic to convert.
audio_out
(audio_utils_msgs/AudioFrame) The converted sound.
This node resamples an audio topic.
input_format
(string): The input audio format ( see audio_utils_msgs/AudioFrame).output_format
(string): The output audio format ( see audio_utils_msgs/AudioFrame).channel_count
(int): The device channel count.input_sampling_frequency
(int): The input sampling frequency.output_sampling_frequency
(int): The output sampling frequency.input_frame_sample_count
(int): The number of samples in each frame of the input.dynamic_input_resampling
(bool: default isfalse
): Iftrue
, always adjust the input sampling informations ( format, sampling frequency and frame sample count) to the sampling informations of the reveiced frames, dynamically. In this mode,input_format
,input_sampling_frequency
andinput_frame_sample_count
are not required, but they can be used to save a recomputation if the starting input sampling informations are known.
audio_in
(audio_utils_msgs/AudioFrame) The sound topic to resample.
audio_out
(audio_utils_msgs/AudioFrame) The resampled sound.
This node split a multichannel audio topic into several mono audio topics.
input_format
(string): The input audio format ( see audio_utils_msgs/AudioFrame).output_format
(string): The output audio format ( see audio_utils_msgs/AudioFrame).channel_count
(int): The device channel count.
audio_in
(audio_utils_msgs/AudioFrame) The sound topic to split.
audio_out_0
(audio_utils_msgs/AudioFrame) The first channel sound.audio_out_1
(audio_utils_msgs/AudioFrame) The second channel sound.- ...
This node writes the raw sound data to a file.
output_path
(string): The output file path.
audio_in
(audio_utils_msgs/AudioFrame) The sound topic to write.
IntRoLab - Intelligent / Interactive / Integrated / Interdisciplinary Robot Lab