Whisper server API

Based on OpenAPI docs

Ping

GET /v1/ping

Returns a OK status to indicate the API is up and running.

Models

List Models

GET /v1/models

Returns a list of available models. Example response:

{
  "object": "list",
  "models": [
    {
      "id": "ggml-large-v3",
      "object": "model",
      "path": "ggml-large-v3.bin",
      "created": 1722090121
    },
    {
      "id": "ggml-medium-q5_0",
      "object": "model",
      "path": "ggml-medium-q5_0.bin",
      "created": 1722081999
    }
  ]
}

Download Model

POST /v1/models
POST /v1/models?stream={bool}

The request should be a application/json, multipart/form-data or application/x-www-form-urlencoded request with the following fields:

{
  "path": "ggml-large-v3.bin"
}

Downloads a model from remote huggingface repository. If the optional stream argument is true, the progress is streamed back to the client as a series of text/event-stream events.

If the model is already downloaded, a 200 OK status is returned. If the model was downloaded, a 201 Created status is returned. Example streaming response:

event: ping

event: progress
data: {"status":"downloading ggml-medium-q5_0.bin","total":539212467,"completed":10159256}

event: progress
data: {"status":"downloading ggml-medium-q5_0.bin","total":539212467,"completed":21895036}

event: progress
data: {"status":"downloading ggml-medium-q5_0.bin","total":539212467,"completed":33540592}

event: ok
data: {"id":"ggml-medium-q5_0","object":"model","path":"ggml-medium-q5_0.bin","created":1722411778}

Delete Model

DELETE /v1/models/{model-id}

Deletes a model by it's ID. If the model is deleted, a 200 OK status is returned.

Transcription and translation with file upload

Transcription

This endpoint's purpose is to transcribe media files into text, in the language of the media file.

POST /v1/audio/transcriptions
POST /v1/audio/transcriptions?stream={bool}

The request should be a multipart/form-data request with the following fields:

{
  "model": "<model-id>",
  "file": "<binary data>",
  "language": "<language-code>",
  "response_format": "<response-format>",
}

Transcribes audio into the input language.

file (required) The audio file object (not file name) to transcribe. This can be audio or video, and the format is auto-detected. The "best" audio stream is selected from the file, and the audio is converted to 16 kHz mono PCM format during transcription.

model-id (required) ID of the model to use. This should have previously been downloaded.

language (optional) The language of the input audio in ISO-639-1 format. If not set, then the language is auto-detected.

response_format (optional, defaults to json). The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

If the optional stream argument is true, the segments of the transcription are returned as a series of text/event-stream events. Otherwise, the full transcription is returned in the response body.

Example streaming response:

event: ping

event: task
data: {"task":"translate","language":"en","duration":62.6155}

event: ping

event: segment
data: {"id":0,"start":0,"end":14.2,"text":" What do you think about new media like Facebook, emails and cell phones?"}

event: segment
data: {"id":1,"start":14.2,"end":18.2,"text":" The new media make our life much easier."}

event: segment
data: {"id":2,"start":18.2,"end":23,"text":" You can get in touch with people much faster than before."}

event: ok

Translation

This is the same as transcription (above) except that the language parameter is always set to 'en', to translate the audio into English.

POST /v1/audio/translations
POST /v1/audio/translations?stream={bool}

Diarization

To diarize an Enlgish-language audio file, use the following endpoint:

POST /v1/audio/diarize
POST /v1/audio/diarize?stream={bool}

The segments returned include a "speaker_turn" field which indicates that the segment is a new speaker. It requires a separate download of a diarization model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API.md

API.md

Whisper server API

Ping

Models

List Models

Download Model

Delete Model

Transcription and translation with file upload

Transcription

Translation

Diarization

Files

API.md

Latest commit

History

API.md

File metadata and controls

Whisper server API

Ping

Models

List Models

Download Model

Delete Model

Transcription and translation with file upload

Transcription

Translation

Diarization