Stijn | Semantic Kernel - Using JSON Mode with Azure OpenAI

With the release of Semantic Kernel version 1.21.1, developers can now leverage OpenAI's JSON mode, enabling structured responses directly from AI models. However, a challenge arises for Azure OpenAI users: JSON mode is only supported on API versions after 2024-08-01-preview, and the Azure OpenAI connector in Semantic Kernel doesn't provide a straightforward way to specify this API version when registering the AzureOpenAIChatCompletionService.

In this blog post, we'll explore a solution to this problem by implementing a custom HTTP message handler that modifies the API version on-the-fly. We'll walk through the provided code step-by-step to ensure you can integrate OpenAI's JSON mode into your Azure OpenAI projects using Semantic Kernel. Credits to Luis Mañez who initially posted this workaround.

Step-by-Step Implementation

1. Define the Data Model

First, we'll define a data model that represents the structured response we expect from the AI model.

internal record Joke
{
    public string JokeContent { get; set; }
    public string Explanation { get; set; }
}

Explanation:

JokeContent: The main content of the joke.
Explanation: An explanation of the joke.

2. Create the Custom API Version Handler

We'll implement a custom HTTP handler that modifies the API version in outgoing requests.

public class ApiVersionHandler : DelegatingHandler
{
    private const string ApiVersionKey = "api-version";
    private const string NewApiVersion = "2024-08-01-preview";

    public ApiVersionHandler() : base(new HttpClientHandler())
    {
    }

    protected override async Task<HttpResponseMessage> SendAsync(
        HttpRequestMessage request,
        CancellationToken cancellationToken)
    {
        var uriBuilder = new UriBuilder(request.RequestUri!);
        var query = HttpUtility.ParseQueryString(uriBuilder.Query);

        // Check if the 'api-version' query parameter exists
        if (query[ApiVersionKey] == null)
            return await base.SendAsync(request, cancellationToken);

        // Update the 'api-version' to the new version
        query[ApiVersionKey] = NewApiVersion;
        uriBuilder.Query = query.ToString();
        request.RequestUri = uriBuilder.Uri;

        // Proceed with the modified request
        return await base.SendAsync(request, cancellationToken);
    }
}

Explanation:

Constants: We define ApiVersionKey and NewApiVersion for clarity and maintainability.
Constructor: Calls the base DelegatingHandler constructor with a new HttpClientHandler.
SendAsync Method:
- Parse the Request URI: Converts the request URI into a UriBuilder for manipulation.
- Modify Query Parameters:
  - Checks if the api-version parameter exists.
  - Updates its value to 2024-08-01-preview.
  - Reconstructs the request URI with the updated query string.
- Proceed with the Request: Calls base.SendAsync to continue processing the request.

3. Configuring Azure OpenAI Chat Completion with the HTTP Handler

We'll configure the Semantic Kernel to use our custom HTTP handler and the Azure OpenAI service.

using JsonModeDemo;
using Microsoft.Extensions.Configuration;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using System.Text.Json;

// Build configuration from user secrets
var configuration = new ConfigurationBuilder()
    .AddUserSecrets<Program>()
    .Build();

// Create the kernel builder
IKernelBuilder? kernelBuilder = Kernel.CreateBuilder();

// Create an HttpClient with the custom ApiVersionHandler
HttpClient httpClient = new HttpClient(new ApiVersionHandler());

// Register the Azure OpenAI Chat Completion service
kernelBuilder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o",
    endpoint: configuration["AzureOpenAI:Endpoint"]!,
    apiKey: configuration["AzureOpenAI:Key"]!,
    httpClient: httpClient);

// Build the kernel
Kernel kernel = kernelBuilder.Build();

Explanation:

Configuration: Loads the Azure OpenAI Endpoint and Key from user secrets.
Kernel Builder: Initializes the kernel builder.
Custom HttpClient: Uses the ApiVersionHandler to ensure all requests have the updated API version.
Service Registration:
- deploymentName: Specifies the Azure OpenAI deployment to use.
- endpoint and apiKey: Credentials for the Azure OpenAI service.
- httpClient: Injects our custom HttpClient with the API version handler.
Kernel Initialization: Builds the kernel, ready to process AI requests.

4. Configure Prompt Execution Settings

We'll specify that we expect a structured JSON response matching our Joke model.

OpenAIPromptExecutionSettings promptExecutionSettings = new OpenAIPromptExecutionSettings()
{
    ResponseFormat = typeof(Joke)
};

Explanation:

ResponseFormat: Informs the AI model that we expect the response in a format that can be deserialized into a Joke object.

5. Invoke the AI Model and Process the Response

We'll prompt the AI model to write a joke about elephants and handle the structured response.

FunctionResult? elephantJoke = await kernel.InvokePromptAsync(
    "Write a joke about elephants",
    new(promptExecutionSettings)
);

// Deserialize the AI response into a Joke object
Joke? jokeResult = JsonSerializer.Deserialize<Joke>(elephantJoke.ToString());

// Output the joke and explanation
Console.WriteLine($"Joke: {jokeResult?.JokeContent}");
Console.WriteLine($"Explanation: {jokeResult?.Explanation}");

Explanation:

InvokePromptAsync:
- Prompt: "Write a joke about elephants".
- Settings: Uses the promptExecutionSettings specifying the expected response format.
Deserialization: Converts the JSON response into a Joke object using JsonSerializer.
Output: Prints the joke content and explanation to the console.

Testing the Implementation

When you run the code, you should receive a structured joke about elephants, including an explanation. For example:

Joke: Why don't elephants use computers? Because they're afraid of the mouse!
Explanation: The joke plays on the double meaning of "mouse"—a computer accessory and an animal that elephants are stereotypically afraid of.

This is a silly example of course, but JSON mode is a useful feature in a lot of situations like e.g. RAG Apps: You could use the structured output on RAG prompts, to get an output in a format like this:

record Answer
{
	public string Content { get; set; }
	public List<Uri> Sources  { get; set; }
}

Conclusion and Acknowledgements

By implementing a custom HTTP handler to modify the API version, we've enabled the use of OpenAI's JSON mode with Azure OpenAI and Semantic Kernel. This solution provides a way to access the latest features without waiting for official updates to the Azure OpenAI connector.

Credits to Luis Mañez who wrote the article that introduced the custom HTTP handler-based solution about a year ago. With the introduction of JSON mode support, I thought it would be a good idea to remind people of this workaround :)

Step-by-Step Implementation

1. Define the Data Model

2. Create the Custom API Version Handler

3. Configuring Azure OpenAI Chat Completion with the HTTP Handler

4. Configure Prompt Execution Settings

5. Invoke the AI Model and Process the Response

Testing the Implementation

Conclusion and Acknowledgements

Related posts