Zero to AI Hero, Part 3: Unleashing the Power of Agents in Semantic Kernel
This article will guide you through building agents with Semantic Kernel, focusing on the key components and offering practical examples.
Join the DZone community and get the full member experience.
Join For FreeAs I promised in Part 2 (Understanding Plugins in Semantic Kernel), it is time to build something substantial with Semantic Kernel. If you are new to Semantic Kernel and must dive into code/head first, I highly recommend starting with Part 1 (Jumpstart Your Journey With Semantic Kernel) of this series. There is a lot of theory out there, but we explore these articles with a GitHub sample you can easily download and play with to understand the core concepts.
I wanted to use Agent Smith from The Matrix, but I can't seem to find one without copyrights. So, DALL-E 3 to the rescue.
Semantic Kernel’s agents aren’t just your typical AI assistants — they’re the multitasking powerhouses that bring advanced automation to your fingertips. By leveraging AI models, plugins, and personas, these agents can perform complex tasks that go beyond mere question-answering and light automation. This article will guide you through building agents with Semantic Kernel, focusing on the key components and offering practical examples to illustrate how to create an agent that plans a trip using various plugins.
In this part, we will start looking into AI agents, expand on our example from Part 2, and plan an entire day trip with our newly minted Agent.
What Are Agents in Semantic Kernel?
Agents in Semantic Kernel are intelligent orchestrators designed to handle complex tasks by interacting with multiple plugins and AI models. They work like a highly organized manager who knows exactly which team members (plugins) to call upon and when to get the job done. Whether it’s planning a road trip, providing weather updates, or even helping you pack for a vacation, agents can combine all these functionalities into a cohesive, efficient flow.
Fundamental Building Blocks of an Agent
- AI Models: The core decision-making unit of an agent, AI models can be Large Language Models like OpenAI’s GPT-4/Mistral AI or small language models like Microsoft's Phi-3. The models interpret user input and generate appropriate responses or actions.
- Plugins: We explored these in Part 2. These specialized tools allow the agent to perform actions like data retrieval, computation, or API communication. Think of plugins as the agent’s Swiss Army knife, each tool ready for a specific purpose. Simply put, plugins are just existing code callable by an agent.
- Plans: Plans define the flow of tasks the agent should follow. They map out each step the agent takes, determining which plugins to activate and in what sequence — this part we haven't discussed yet. We will go over plans in this article.
- Personas: A persona is simply the agent's role in a given context. In the general AI world, it is often called a meta prompt or system prompt. These instructions set the tone for the Agent and give it ground rules for what to do when in doubt.
- Memory: Memory helps agents retain information across interactions, allowing them to maintain context and remember user preferences. In other words, a simple chat history is part of memory, giving the agent a conversation context. Even if you provide a simple input like "yes" to an Agent's question, the Agent can tie your "yes" to the rest of the conversation and understand what you are answering, much like the humans.
There are a few more small components that belong to Agents, such as connectors, etc.; we will omit them here to focus on what matters.
It’s Time to Plan for Our Spontaneous Day Trip
Let's build an agent capable of planning a day trip by car. Where I live, I have access to the mountains by the Poconos, Jersey Shore beaches, and the greatest city of New York, all within an hour or two drive. I want to build an Agent capable of planning my entire day trip, considering the weather, what to pack, whether my car is fully charged, etc. Let's dive code/head first onto our Agent.
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using System.ComponentModel;
var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
deploymentName: "<YOUR_DEPLOYMENT_NAME>",
endpoint: "<YOUR_ENDPOINT>",
apiKey: "<YOUR_AZURE_OPENAI_API_KEY>"
);
builder.Plugins.AddFromType<TripPlanner>(); // <----- This is a new fellow on this Part 3 - TripPlanner. Let's add it to the Kernel
builder.Plugins.AddFromType<TimeTeller>(); // <----- This is the same fellow plugin from Part 2
builder.Plugins.AddFromType<ElectricCar>(); // <----- This is the same fellow plugin from Part 2
builder.Plugins.AddFromType<WeatherForecaster>(); // <----- New plugin. We don't want to end up in beach with rain, right?
var kernel = builder.Build();
IChatCompletionService chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
ChatHistory chatMessages = new ChatHistory("""
You are a friendly assistant who likes to follow the rules. You will complete required steps
and request approval before taking any consequential actions. If the user doesn't provide
enough information for you to complete a task, you will keep asking questions until you have
enough information to complete the task.
""");
while (true)
{
Console.Write("User > ");
chatMessages.AddUserMessage(Console.ReadLine()!);
OpenAIPromptExecutionSettings settings = new() { ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions };
var result = chatCompletionService.GetStreamingChatMessageContentsAsync(
chatMessages,
executionSettings: settings,
kernel: kernel);
Console.Write("Assistant > ");
// Stream the results
string fullMessage = "";
await foreach (var content in result)
{
Console.Write(content.Content);
fullMessage += content.Content;
}
Console.WriteLine("\n--------------------------------------------------------------");
// Add the message from the agent to the chat history
chatMessages.AddAssistantMessage(fullMessage);
}
public class TripPlanner // <------------ Trip planner. An expert on planning trips
{
[KernelFunction]
[Description("Returns back the required steps necessary to plan a one day travel to a destination by an electric car.")]
[return: Description("The list of steps needed to plan a one day travel by an electric car")]
public async Task<string> GenerateRequiredStepsAsync(
Kernel kernel,
[Description("A 2-3 sentence description of where is a good place to go to today")] string destination,
[Description("The time of the day to start the trip")] string timeOfDay)
{
// Prompt the LLM to generate a list of steps to complete the task
var result = await kernel.InvokePromptAsync($"""
I'm going to plan a short one day vacation to {destination}. I would like to start around {timeOfDay}.
Before I do that, can you succinctly recommend the top 2 steps I should take in a numbered list?
I want to make sure I don't forget to pack anything for the weather at my destination and my car is sufficiently charged before I start the journey.
""", new() {
{ "destination", destination },
{ "timeOfDay", timeOfDay }
});
// Return the plan back to the agent
return result.ToString();
}
}
public class TimeTeller // <------------ Time teller plugin. An expert on time, peak and off-peak periods
{
[KernelFunction]
[Description("This function retrieves the current time.")]
[return: Description("The current time.")]
public string GetCurrentTime() => DateTime.Now.ToString("F");
[KernelFunction]
[Description("This function checks if the current time is off-peak.")]
[return: Description("True if the current time is off-peak; otherwise, false.")]
public bool IsOffPeak() => DateTime.Now.Hour < 7 || DateTime.Now.Hour >= 21;
}
public class WeatherForecaster // <------------ Weather plugin. An expert on weather. Can tell the weather at a given destination
{
[KernelFunction]
[Description("This function retrieves weather at given destination.")]
[return: Description("Weather at given destination.")]
public string GetTodaysWeather([Description("The destination to retrieve the weather for.")] string destination)
{
// <--------- This is where you would call a fancy weather API to get the weather for the given <<destination>>.
// We are just simulating a random weather here.
string[] weatherPatterns = { "Sunny", "Cloudy", "Windy", "Rainy", "Snowy" };
Random rand = new Random();
return weatherPatterns[rand.Next(weatherPatterns.Length)];
}
}
public class ElectricCar // <------------ Car plugin. Knows about states and conditions of the electric car. Also can charge the car.
{
private bool isCarCharging = false;
private int batteryLevel = 0;
private CancellationTokenSource source;
// Mimic charging the electric car, using a periodic timer.
private async Task AddJuice()
{
source = new CancellationTokenSource();
var timer = new PeriodicTimer(TimeSpan.FromSeconds(5));
while (await timer.WaitForNextTickAsync(source.Token))
{
batteryLevel++;
if (batteryLevel == 100)
{
isCarCharging = false;
Console.WriteLine("\rBattery is full.");
source.Cancel();
return;
}
//Console.WriteLine($"Charging {batteryLevel}%");
Console.Write("\rCharging {0}%", batteryLevel);
}
}
[KernelFunction]
[Description("This function checks if the electric car is currently charging.")]
[return: Description("True if the car is charging; otherwise, false.")]
public bool IsCarCharging() => isCarCharging;
[KernelFunction]
[Description("This function returns the current battery level of the electric car.")]
[return: Description("The current battery level.")]
public int GetBatteryLevel() => batteryLevel;
[KernelFunction]
[Description("This function starts charging the electric car.")]
[return: Description("A message indicating the status of the charging process.")]
public string StartCharging()
{
if (isCarCharging)
{
return "Car is already charging.";
}
else if (batteryLevel == 100)
{
return "Battery is already full.";
}
Task.Run(AddJuice);
isCarCharging = true;
return "Charging started.";
}
[KernelFunction]
[Description("This function stops charging the electric car.")]
[return: Description("A message indicating the status of the charging process.")]
public string StopCharging()
{
if (!isCarCharging)
{
return "Car is not charging.";
}
isCarCharging = false;
source?.Cancel();
return "Charging stopped.";
}
}
We will dissect the code later. For now, let's ask our Agent to plan our day trip for us.
Kinda cool, isn't it? We didn't tell the Agent we wanted to charge the electric car. We only told the Agent to plan a trip; it knows intuitively that:
- The electric car needs to be charged, and
- The weather needs to be checked.
Cool, indeed!
We have a small charging simulator using .NET's PeriodicTimer
. It is irrelevant for SK, but it would give an exciting update on the console, showing that the charging and battery juice levels are ongoing. As you can see in the screenshot below, I asked the Agent to stop charging the car when the battery level was 91%, which is sufficient for the trip.
Did you also notice an interesting thing? When I first asked the question, I only said to plan a trip to the beach. I didn't mention when I was planning to go or which beach. The Agent was aware of this and asked us clarifying questions to get answers to these questions. This is where the persona+memory and the planner come into the picture. Let's start dissecting the code sideways with the Planner first.
Planner: The Manager of Everything
Think of a planner as a manager of some sort. It can identify the course of action, or "simple steps," to achieve what the user wants. In the above example, the planner identifies two steps.
- Check the weather and pack accordingly: This is where the
WeatherForecaster
plugin comes into play later. - Ensure the car is ready for the trip: This is where the
ElectricCar
plugin comes into play later.
public class TripPlanner // <------------ Trip planner. An expert on planning trips
{
[KernelFunction]
[Description("Returns back the required steps necessary to plan a one day travel to a destination by an electric car.")]
[return: Description("The list of steps needed to plan a one day travel by an electric car")]
public async Task<string> GenerateRequiredStepsAsync(
Kernel kernel,
[Description("A 2-3 sentence description of where is a good place to go to today")] string destination,
[Description("The time of the day to start the trip")] string timeOfDay)
{
// Prompt the LLM to generate a list of steps to complete the task
var result = await kernel.InvokePromptAsync($"""
I'm going to plan a short one day vacation to {destination}. I would like to start around {timeOfDay}.
Before I do that, can you succinctly recommend the top 2 steps I should take in a numbered list?
I want to make sure I don't forget to pack anything for the weather at my destination and my car is sufficiently charged before I start the journey.
""", new() {
{ "destination", destination },
{ "timeOfDay", timeOfDay }
});
// Return the plan back to the agent
return result.ToString();
}
}
Look at the parameters of the GenerateRequiredStepsAsync
KernelFunction. It also needs to take in destination
and timeOfDay
. These are necessary to plan the trip. Without knowing when and to where there can be no trips. Now, take a closer look at the prompt.
This is where we tell the planner that I want to plan for the following:
- A day trip
- To the given destination
- At the specified time
- I am using my electric car.
- I haven't packed for the weather at the destination.
Now our Agent knows through the planner that we need to come up with steps to satisfy all of these to plan the trip. The Agent is also aware of available plugins and has the authority to invoke them to provide me with a pleasant trip.
Persona: Who Am I?
This is where we tell the Agent who it is. The agent's persona is important as it helps the model act within character and take instructions from the user to decide what to do in a dilemma, what steps are to be taken before an action, etc. In short, personas define the ground rules of behavior of an Agent.
ChatHistory chatMessages = new ChatHistory("""
You are a friendly assistant who likes to follow the rules. You will complete required steps
and request approval before taking any consequential actions. If the user doesn't provide
enough information for you to complete a task, you will keep asking questions until you have
enough information to complete the task.
""");
Here, we clearly define the character and role of our agent. We told it that you:
- Are an assistant
- Will follow given rules
- Take steps.
- Ask for approval before any major actions.
- Get clarification if the user doesn't give enough input.
Iterations and Memory
A new CharHistory
instance is created with meta prompt/persona instruction as the first message. This history, later added by the user's input and LLM's responses, serves as a context memory of the conversation. This helps the Agent choose the correct action based on the context derived from the conversation history.
while (true)
{
Console.Write("User > ");
chatMessages.AddUserMessage(Console.ReadLine()!);
OpenAIPromptExecutionSettings settings = new() { ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions };
var result = chatCompletionService.GetStreamingChatMessageContentsAsync(
chatMessages,
executionSettings: settings,
kernel: kernel);
Console.Write("Assistant > ");
// Stream the results
string fullMessage = "";
await foreach (var content in result)
{
Console.Write(content.Content);
fullMessage += content.Content;
}
Console.WriteLine("\n--------------------------------------------------------------");
// Add the message from the agent to the chat history
chatMessages.AddAssistantMessage(fullMessage);
}
As you can see, we are setting ToolCallBehavior
to ToolCallBehavior.AutoInvokeKernelFunctions
. This gives our Agent enough authority to invoke plugins when necessary. Each user's input and the model's response are added to the chatMessages
. This will help set the context for further interactions. When I say, "That's enough charging," the agent would know that the car is being charged based on previous conversations. An agent's memory gear is nothing but chat history here. Augmented data would also serve as memory (part of the fancy RAG); we wouldn't touch on that for now.
Plugins: The Robotic Arms
We have already discussed plugins in detail in Part 2. We have added a WeatherForecaster
plugin to the mix to help us plan the trip. In a real-world scenario, we would call a real weather API to get the actual weather. We are picking a random weather pattern for this example, which should suffice. We have also added a batteryLevel
variable to our ElectricCar
plugin. This helps us simulate the charging behavior using a simple timer. We wouldn't be getting into the details of each of these plugins here. Please revisit Part 2 to have a deeper understanding of how plugins work.
As usual, this article includes a working GitHub sample. Clone the code and enjoy playing with it.
Wrap Up
We started harnessing the power of the Semantic Kernel. Once we start mixing plugins with persona, planner, and memory, the resulting Agents can automate tasks, ask leading questions, take actions on your behalf, get confirmation before executing essential tasks, and more. Agents in Semantic Kernel are not just tools; they’re dynamic assistants that combine the power of AI, plugins, and orchestrated plans to solve complex problems. By understanding their building blocks — AI models, plugins, plans, memory, and connectors — you can create competent agents tailored to your specific needs. The possibilities are vast, from managing travel plans to automating tedious tasks, making Semantic Kernel a powerful ally in your AI toolkit.
What's Next?
Now that we have connected all the pieces of the Semantic Kernel puzzle through Part 1, Part 2, and this article, it is time to start thinking beyond a console application. In the following parts of our series, we will add an Agent to an ASP.NET Core API and use dependency injection to create more than one kernel instance to help us navigate our trip planning. We are not going to stop there. We will integrate Semantic Kernel to a locally downloaded Small Language Model (SLM) and make it work for us. Once that works, we aren't far from a .NET MAUI app that can do the AI dance without internet connectivity or GPT-4. I am not going to spoil most of the surprises, keep going through this series to learn more and more!
Opinions expressed by DZone contributors are their own.
Comments