add linguistics #94

felixbur · 2023-11-09T13:19:32Z

Nkululeko could be multimodal, if a
transcript
field is added to the audio files
and then,
linguistic features extractors could be added to the feature_sets

bagustris · 2023-11-10T02:06:03Z

Some datasets already have transcriptions (but I skip that since I don't think it will be needed). It can be added as an additional column in the CSV or audformat. If there is no transcription, we can utilize hugging face (such as a whisper) to generate transcripts during pre-processing in each dataset. Then, the "linguistic feature extractor" will process transcription in the transcript column (I propose this name as the header of transcription) to generate word embeddings (linguistic feature).

This is useful to use speech along with transcription for the detection of such degradation like Alzheimer's.

bagustris · 2024-10-22T01:41:32Z

If there is a transcript field/column in the CSV file (assuming format is CSV), this can be easily solved using speechbrain package.

https://speechbrain.readthedocs.io/en/latest/API/speechbrain.wordemb.transformer.html

e.g.,

file	label	transcript
1.wav	happy	"I am happy"

felixbur · 2024-10-22T07:49:38Z

to me it seems most intuitive to extend the predict module by

text transcript
speaker id
thereby solving #167

bagustris changed the title ~~add linuguistics~~ add linguistics Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add linguistics #94

add linguistics #94

felixbur commented Nov 9, 2023

bagustris commented Nov 10, 2023

bagustris commented Oct 22, 2024

felixbur commented Oct 22, 2024

add linguistics #94

add linguistics #94

Comments

felixbur commented Nov 9, 2023

bagustris commented Nov 10, 2023

bagustris commented Oct 22, 2024

felixbur commented Oct 22, 2024