Based on OpenAPI docs
GET /v1/ping
Returns a OK status to indicate the API is up and running.
GET /v1/models
Returns a list of available models. Example response:
"object": "list",
"models": [
"id": "ggml-large-v3",
"object": "model",
"path": "ggml-large-v3.bin",
"created": 1722090121
"id": "ggml-medium-q5_0",
"object": "model",
"path": "ggml-medium-q5_0.bin",
"created": 1722081999
POST /v1/models
POST /v1/models?stream={bool}
The request should be a application/json, multipart/form-data or application/x-www-form-urlencoded request with the following fields:
"path": "ggml-large-v3.bin"
Downloads a model from remote huggingface repository. If the optional stream
argument is true,
the progress is streamed back to the client as a series of text/event-stream events.
If the model is already downloaded, a 200 OK status is returned. If the model was downloaded, a 201 Created status is returned. Example streaming response:
event: ping
event: progress
data: {"status":"downloading ggml-medium-q5_0.bin","total":539212467,"completed":10159256}
event: progress
data: {"status":"downloading ggml-medium-q5_0.bin","total":539212467,"completed":21895036}
event: progress
data: {"status":"downloading ggml-medium-q5_0.bin","total":539212467,"completed":33540592}
event: ok
data: {"id":"ggml-medium-q5_0","object":"model","path":"ggml-medium-q5_0.bin","created":1722411778}
DELETE /v1/models/{model-id}
Deletes a model by it's ID. If the model is deleted, a 200 OK status is returned.
This endpoint's purpose is to transcribe media files into text, in the language of the media file.
POST /v1/audio/transcriptions
POST /v1/audio/transcriptions?stream={bool}
The request should be a multipart/form-data request with the following fields:
"model": "<model-id>",
"file": "<binary data>",
"language": "<language-code>",
"response_format": "<response-format>",
Transcribes audio into the input language.
(required) The audio file object (not file name) to transcribe. This can be audio or video, and the format is auto-detected. The "best" audio stream is selected from the file, and the audio is converted to 16 kHz mono PCM format during transcription.
(required) ID of the model to use. This should have previously been downloaded.
(optional) The language of the input audio in ISO-639-1 format. If not set, then the language is auto-detected.
(optional, defaults to json
). The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
If the optional stream
argument is true, the segments of the transcription are returned as a series of text/event-stream events. Otherwise, the full transcription is returned in the response body.
Example streaming response:
event: ping
event: task
data: {"task":"translate","language":"en","duration":62.6155}
event: ping
event: segment
data: {"id":0,"start":0,"end":14.2,"text":" What do you think about new media like Facebook, emails and cell phones?"}
event: segment
data: {"id":1,"start":14.2,"end":18.2,"text":" The new media make our life much easier."}
event: segment
data: {"id":2,"start":18.2,"end":23,"text":" You can get in touch with people much faster than before."}
event: ok
This is the same as transcription (above) except that the language
parameter is always set to 'en', to translate the audio into English.
POST /v1/audio/translations
POST /v1/audio/translations?stream={bool}
To diarize an Enlgish-language audio file, use the following endpoint:
POST /v1/audio/diarize
POST /v1/audio/diarize?stream={bool}
The segments returned include a "speaker_turn" field which indicates that the segment is a new speaker. It requires a separate download of a diarization model.