Skip to content

Power your product with real-time and async AI transcription, translation and audio intelligence add-ons using a single API.

Notifications You must be signed in to change notification settings

j5py/gladia-home-assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gladia Home Assignement


Instructions

I have reported the instructions provided in Instructions.md.


Results

Task 1: Technical Understanding and API Integration


API Integration

No need to know React, just run the app and browse!


  1. Load the React app locally

    git clone [email protected]:j5py/gladia-home-assignment.git
    
  2. Install the app

    cd gladia-home-assignment; npm install
    
  3. Run the app

    npm start
    
  4. Interact with the app that should now have automatically opened on localhost:3000

  5. Review the code written to implement the API



Technical Write-Up

How to Generate a Bullet Point Summary of an Audio File

The following Shell commands can be executed from a Unix/Linux terminal. The syntax may need to be adapted for Command Prompt, for example, on Windows OS. However, you can use Git Bash (especially from a Visual Studio Code terminal). Using curl and a terminal facilitates testing, the documentation also provides examples in Python, JavaScript, PHP, Go, and Java.


Quick Overview of the Necessary Steps
  1. Upload the file to Gladia and get its URL in return
  2. Request Gladia to proceed with the transcription of the file at the given URL
  3. A new URL allows you to obtain a dataset, including the text of the transcription

Upload Your File to Gladia
curl --request POST \
  --url https://api.gladia.io/v2/upload \
  --header 'Content-Type: multipart/form-data' \
  --header 'x-gladia-key: <your_gladia_api_key>' \
  --form audio=@<your_audio_file>
Placeholder Example Comment
<your_gladia_api_key> Get a key in less than a minute and for free after creating your account
<your_audio_file> ./audio.wav In case the audio file is in the current directory from which you run curl

After a short delay for the request to be processed, you will receive a JSON dataset, and one key particularly interests us here: audio_url. Note the corresponding value for the next step.


Request Gladia to Process the Transcription of the Uploaded File
curl --request POST \
  --url https://api.gladia.io/v2/transcription \
  --header 'Content-Type: application/json' \
  --header 'x-gladia-key: <your_gladia_api_key>' \
  --data '{
    "audio_to_llm": true,
    "audio_to_llm_config": {
      "prompts": [
        "Extract the key points from the transcription as bullet points"
      ]
    },
    "audio_url": "<your_url_received_after_last_upload>"
  }'
JSON dataset Comment
audio_to_llm This parameter allows the subsequent configuration point to be taken into account
audio_to_llm_config Edit the content of the prompts list (or array) as comma-separated strings

Remember to edit the placeholders, but retain the double quotes in the --data option to ensure it matches the JSON format. Check cURL description of Create Transcription to see an example of a custom prompt request, and Audio to LLM for several prompts in a single request. Finally, this time it's the value of result_url that you need to note from the response.


Check that the Requested Transcription is Available
curl --request GET \
  --url <your_result_url> \
  --header 'x-gladia-key: <your_gladia_api_key>'

Note that the value of the key status may initially return queued or processing. In those case, you will need to repeat the request until you obtain done in a JSON dataset that may vary in size depending on the length of the uploaded audio.


Filter the Dataset to Obtain the Summary

First, let's continue with curl to complete the process. This request is identical to the previous one, but it allows for filtering of a potentially large result.

curl --request GET \
  --url <your_result_url> \
  --header 'x-gladia-key: <your_gladia_api_key>' | \
  grep -o '"response":"[^"]*' | \
  grep -o '[^:]*$' | \
  tr -d '"'

Note that this command will only filter the first occurrence of response corresponding to the first prompt. Subsequent occurrences will be ignored if your previous request contained more than one prompt (in this case, you might consider jq).


Snippet Description
\ Allows you to split a long command across multiple lines
| Pipes output to the next command
grep -o '"response":"[^"]*' | captures everything from "response":" up to the next double quote
grep -o '[^:]*$' | Captures the content after the last colon
tr -d '"' Removes double quotes from the output

Now with JavaScript, you can also request the same URL as in the previous curl command to achieve a more immediate reading of the response access. You can open Chrome DevTools and paste your edited version of the code below to execute it from your browser's console.

fetch('<your_result_url>', {
    method: 'GET',
    headers: { 'x-gladia-key': '<your_gladia_api_key>' }
}).then(
    jsonString => jsonString.json()
).then(
    jsObject => console.log(jsObject.result.audio_to_llm.results[0].results.response)
)

Knowing that the access can be read with <receivedObject>.result.audio_to_llm.results[0].results.response in JavaScript helps us understand that the value we are looking for is part of a much larger structure, which is why it can be tedious to find using the raw result of curl. In the case where your previous request contained more than one prompt, you will need to iterate through the results in this way, for example: <receivedObject>.result.audio_to_llm.results.forEach(<item> => ...



Task 2: Customer Support Simulation

Mock Tickets


Ticket 1

"I keep getting a 401 Unauthorized error when trying to access the API."

Re:

Good Day,

Thank you for your message. I'm glad to assist you with this matter.

I would first recommend checking if the x-gladia-key value used in your request exists and fully matches one of the API keys listed on your Account page. A simple visual check can help spot if the key is altered, but it's good practice to use the 'copy to clipboard' feature and paste the key into your request code to ensure you have the correct value.

Please let me know if this allows you to solve the issue. If you continue to receive the 401 status code, I will first need you to provide me with a few details about your technical environment, such as the code snippet of the request and/or a screenshot of your test.


Kind regards



Ticket 2

"I receive an error when trying some of my files: 413 file too large. Why is that and what can I do?"

Re:

Good Day,

Thank you for contacting support. I'm here to help you with your issue.

This is the expected behavior of the Gladia API for performance reasons. Currently, the API has limitations of 135 minutes and 500 MB in size, but we provide you with a few tools for splitting audio files. You need to initiate the setup of one of these tools based on your environment. Please do not hesitate to confirm when it is resolved, or reach out if you encounter any issues with the configuration.


Kind regards



Ticket 3

"I'm seeing a significant delay in receiving the transcription results. Is there a way to speed this up?"

Re:

Good Day,

Thank you for reaching out to us. We appreciate your inquiry.

We can provide high concurrency based on your needs. If you are experiencing a significant delay in receiving results, it is likely due to handling high volumes of requests that exceed the default values. We can offer you additional computational capacity to manage a large number of requests or operations simultaneously.

Could you please share your availability for a meeting to discuss your usage requirements and evaluate the solution?


Kind regards



Ticket 4

"Do you support Chinese?"

Re:

Good Day,

Thanks for reaching out! I'm here to help you with any questions you may have.

Absolutely! Chinese, identified by the ISO 639 language code zh, is one of the available options under Create Transcription > Body > language and is also one of the possible values of target_languages in the examples accessible through Translation > Sample code.

Please let me know if you are making progress. I am available to assist you if you encounter any difficulties. You may also refer to the complete list of supported languages for any future needs.


Kind regards



Ticket 5

"We received complaints from our customers about some very inaccurate transcriptions yesterday, can you check what happened?"

Re:

Good Day,

I appreciate you getting in touch. Let's work together to resolve your concern.

Could you please specify which functionality of Gladia or API key is subject to inaccurate transcriptions, and could you provide us with any records of the requests and responses?


Here are some ways to fine-tune the configuration by functionality to prevent inaccuracy:

  • Speech Recognition

    • Multiple languages detection : Is not activated by default and allows switching the transcription language, but strong accents may cause misinterpretation.
    • Custom vocabulary and Custom spelling can significantly enhance the accuracy of transcriptions in specific contexts, such as professional environments where the use of acronyms and internal expressions is commonplace.
  • Live Speech Recognition

  • Speaker Diarization

    • Speaker diarization may also improve accuracy by determining the identity of each speaker and attributing their statements accordingly.

If it is complex for you to specify the context of the feedback observed, we could schedule a meeting to identify it at your convenience.


Kind regards



Task 3: Technical Documentation and Communication

Documentation Review

The added section to enhance the original documentation is italicized.


Audio to LLM

Harness the power of the LLM command prompt to access a range of processing capabilities beyond transcription, all in a single request. In addition to speech-to-text, you will fully leverage the value of its content. Ask any question or analysis, as you would do with an assistant.

You only need to write the different treatments you wish to apply to the transcription, with the prompts listed as comma-separated values in the prompts list (or array). In addition to the complete transcription, you will obtain a list of results: each will contain the prompt sent to the LLM and its associated output.


The Audio to LLM feature applies your own prompts to the audio transcription.

import requests
import time


def make_fetch_request(url, headers, method='GET', data=None):
    if method == 'POST':
        response = requests.post(url, headers=headers, json=data)
    else:
        response = requests.get(url, headers=headers)
    return response.json()


gladia_key = "YOUR_GLADIA_API_TOKEN"
request_data = {
  "audio_url": "YOUR_AUDIO_URL",
  "audio_to_llm": True,
  "audio_to_llm_config": {
    "prompts": [
      "Extract the key points from the transcription as bullet points",
      "Generate a title from this transcription"
    ]
  }
}
gladia_url = "https://api.gladia.io/v2/transcription/"

headers = {
    "x-gladia-key": gladia_key,
    "Content-Type": "application/json"
}

print("- Sending initial request to Gladia API...")
initial_response = make_fetch_request(
    gladia_url, headers, 'POST', request_data)

print("Initial response with Transcription ID:", initial_response)
result_url = initial_response.get("result_url")

if result_url:
    while True:
        print("Polling for results...")
        poll_response = make_fetch_request(result_url, headers)

        if poll_response.get("status") == "done":
            print("- Transcription done: \n")
            audio_to_llm_results = poll_response.get("result", {}).get(
                "audio_to_llm", {})
            print(audio_to_llm_results)
            break
        else:
            print("Transcription status:", poll_response.get("status"))
        time.sleep(1)

With this code, your output will look like this:

{
  "success": true,
  "is_empty": false,
  "results": [
    {
      "success": true,
      "is_empty": false,
      "results": {
        "prompt": "Extract the key points from the transcription as bullet points",
        "response": "The main entities key points from the transcription are:\n- ..."
      },
      "exec_time": 1.7726809978485107,
      "error": null
    },
    {
      "success": true,
      "is_empty": false,
      "results": {
        "prompt": "Generate a title from this transcription",
        "response": "The Great Title"
      },
      "exec_time": 1.7832809978258485,
      "error": null
    }
  ],
  "exec_time": 6.127103805541992,
  "error": null
}

You’ll find your custom prompts results of your audio under the results key.



About

Power your product with real-time and async AI transcription, translation and audio intelligence add-ons using a single API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published