The Azure AI Model Inference is an API that exposes a common set of capabilities for foundational models and that can be used by developers to consume predictions from a diverse set of models in a uniform and consistent way. Developers can talk with different models deployed in Azure AI Studio without changing the underlying code they are using.
Microsoft now has its own SDK for AI Model inference, for different models hosted on MaaS/MaaP.
Python and JS versions are out. C# will be released next.
For JavaScript Samples
The SDK uses the REST API documented here.
The Azure AI Model Inference API is available in the following Phi-3 models:
- Models deployed to serverless API endpoints:
- Models deployed to managed inference:
The API is compatible with Azure OpenAI model deployments.
Note
The Azure AI model inference API is available in managed inference (Managed Online Endpoints) for models deployed after June 24th, 2024. To take advance of the API, redeploy your endpoint if the model has been deployed before such date.
The following section describes some of the capabilities the API exposes. For a full specification of the API, view the reference section.
The API indicates how developers can consume predictions for the following modalities:
- Get info: Returns the information about the model deployed under the endpoint.
- Text embeddings: Creates an embedding vector representing the input text.
- Text completions: Creates a completion for the provided prompt and parameters.
- Chat completions: Creates a model response for the given chat conversation.
- Image embeddings: Creates an embedding vector representing the input text and image.