New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add benchmarking audio model #39

Open

4 tasks

sangttruong opened this issue Sep 8, 2024 · 0 comments

Assignees

Collaborator

sangttruong commented Sep 8, 2024 •

edited by martinakaduc

Loading

To add new audio modality, we do the following steps.

Try to construct a testing dataset for Audio Question-Answering. Available data can be used is here
Modify API calling interface for Speech-LLM. Suggested API data format is as following

messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "audio_url",
                    "audio_url": {
                        "url": "https://st.com/audio.wav",
                    }
                },
            ],
        }
    ],

or

messages=[
       {
           "role": "user",
           "content": [
               {"type": "text", "text": "What's in this image?"},
               {
                   "type": "audio_b64_json.",
                   "audio_b64_json.": {
                       "b64_json.": "Base64EncoedAudio",
                   }
               },
           ],
       }
   ],

Review code of inference server of speech models and edit it (if needed)
Run 1-2 experiments with created datasets and models

The text was updated successfully, but these errors were encountered:

sangttruong assigned ledong0110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment