Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add linguistics #94

Open
felixbur opened this issue Nov 9, 2023 · 3 comments
Open

add linguistics #94

felixbur opened this issue Nov 9, 2023 · 3 comments

Comments

@felixbur
Copy link
Owner

felixbur commented Nov 9, 2023

Nkululeko could be multimodal, if a
transcript
field is added to the audio files
and then,
linguistic features extractors could be added to the feature_sets

@bagustris
Copy link
Collaborator

Some datasets already have transcriptions (but I skip that since I don't think it will be needed). It can be added as an additional column in the CSV or audformat. If there is no transcription, we can utilize hugging face (such as a whisper) to generate transcripts during pre-processing in each dataset. Then, the "linguistic feature extractor" will process transcription in the transcript column (I propose this name as the header of transcription) to generate word embeddings (linguistic feature).

This is useful to use speech along with transcription for the detection of such degradation like Alzheimer's.

@bagustris bagustris changed the title add linuguistics add linguistics Nov 10, 2023
@bagustris
Copy link
Collaborator

If there is a transcript field/column in the CSV file (assuming format is CSV), this can be easily solved using speechbrain package.

https://speechbrain.readthedocs.io/en/latest/API/speechbrain.wordemb.transformer.html

e.g.,

file label transcript
1.wav happy "I am happy"

@felixbur
Copy link
Owner Author

to me it seems most intuitive to extend the predict module by

  • text transcript
  • speaker id
    thereby solving #167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants