Spring AI ChatClient Tutorial: Build a Travel Assistant API

Q: What is Spring AI ChatClient?

ChatClient is Spring AI's fluent API for communicating with LLMs. It provides the same builder pattern as WebClient and RestClient, supporting both synchronous responses with .call() and reactive streaming with .stream(). It abstracts away the provider so the same code works with OpenAI, Groq, Anthropic, or any supported model.

Q: How do I configure multiple models in Spring AI?

Set spring.ai.chat.client.enabled=false to disable the auto-configured ChatClient.Builder. Define separate @Bean methods for each model. Use mutate() on OpenAiApi and OpenAiChatModel to create model variants without rebuilding from scratch. Inject with @Qualifier.

Q: How does conversation memory work in Spring AI?

MessageWindowChatMemory stores the last N messages (default 20) per conversation. Attach it to a ChatClient using MessageChatMemoryAdvisor. Send a unique conversationId with each request via .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId)). Spring AI retrieves and prepends the history to every prompt automatically.

Q: Why do I need WebFlux in a WebMVC application?

Spring AI uses Flux<String> from Project Reactor for streaming. Reactor requires the reactive stack (Netty) at runtime. WebMVC and WebFlux can coexist in the same application — add spring-boot-starter-webflux alongside spring-boot-starter-webmvc.

Q: What is the difference between call() and stream() in Spring AI?

call() waits for the complete LLM response before returning. stream() returns a Flux<String> where tokens arrive as they are generated. Use stream() when you want real-time output visible to the user. Use call() when you need the full response for processing, storage, or structured deserialization with .entity().

📖 Spring AI ChatClient tutorial · ⏱ 18 min read · 🔧 Java 21 · ⭐ Spring AI 2.0.0-M4

Spring AI ChatClient tutorial project setup with Java 21 and Spring Boot

There’s a question every Java developer asks the moment they start exploring AI: do I really have to leave Spring for this?

📺 Video overview

Spring AI ChatClient — concepts covered in this article · @smartgnt

Python dominates the AI space. LangChain, LlamaIndex, FastAPI — the entire ecosystem gravitates around it. But if you’ve spent years building production backends with Spring Boot, switching ecosystems just to call an LLM API feels wrong. You’re not learning AI — you’re relearning infrastructure.

Spring AI exists to answer that question with a no. You stay in Spring. You use dependency injection, auto-configuration, and the fluent builder patterns you already know. The LLM becomes just another service.

In this tutorial, we’ll build a Travel Assistant REST API from scratch using Spring AI’s ChatClient. By the end, you’ll understand how to call LLMs synchronously and in streaming mode, how to deserialize structured JSON responses into nested Java records, how to maintain conversation memory across stateless HTTP requests, and how to run two AI models in parallel and compare their output — all within a standard Spring Boot application.

The complete source code is on GitHub: github.com/samedsam/spring-ai-travel-assistant

🎯 What you’ll learn

✅ Call LLMs synchronously and in streaming mode using ChatClient
✅ Deserialize JSON responses into nested Java records with .entity()
✅ Maintain conversation memory across stateless HTTP requests
✅ Configure multiple AI models (Groq + OpenAI) using mutate()
✅ Run parallel model comparisons with Java 21 Virtual Threads
✅ Scope memory precisely to avoid cross-feature contamination

What is Spring AI and Why Does It Matter?

Before writing a single line of code, let’s understand what Spring AI actually does and why it’s designed the way it is.

At its core, Spring AI is an abstraction layer. It sits between your application and the LLM providers — OpenAI, Groq, Anthropic, Ollama, and others. You write code against Spring AI’s interfaces. When you want to switch from OpenAI to Groq, you change a configuration property, not your business logic.

This is the same philosophy Spring has always had: hide the complexity of integration behind clean, portable abstractions.

The central object you’ll work with is ChatClient. Think of it as the AI equivalent of WebClient or RestClient — a fluent builder API for sending prompts and receiving responses. If you’ve used either of those, ChatClient will feel immediately familiar:

// WebClient — you already know this
webClient.get()
    .uri("/api/data")
    .retrieve()
    .bodyToMono(String.class);

// ChatClient — same pattern, different destination
chatClient.prompt()
    .user("Recommend a trip to Paris for 5 days")
    .call()
    .content();

The same idea: build the request fluently, execute it, extract the result.

Synchronous vs Streaming

ChatClient supports two execution modes:

Synchronous (.call()): The application waits until the LLM finishes generating the entire response, then returns it all at once. Simple, but the user sees a blank screen until it’s done — which can be 3 to 8 seconds for long responses.

Streaming (.stream()): Tokens arrive progressively as the LLM generates them. The first word appears in under a second. The user reads while the model is still writing. This is how ChatGPT renders its responses.

You choose the mode based on what the user needs. For a structured JSON itinerary you’ll process programmatically, synchronous is fine. For a long narrative the user will read, streaming is the right choice.

What Are Advisors?

An Advisor is an interceptor that runs before or after each LLM call. Spring AI ships with several built-in ones:

MessageChatMemoryAdvisor — automatically injects conversation history into every prompt, making the LLM stateful across multiple requests
SimpleLoggerAdvisor — logs requests and responses for debugging
ENABLE_NATIVE_STRUCTURED_OUTPUT — enables native JSON output mode on models that support it

You can chain multiple Advisors. They run in order, transforming the request or the response. You’ll use the memory advisor in the /chat endpoint later in this tutorial.

Now let’s build.

Project Setup

The Stack

Java 21
Spring Boot 4.0.5
Spring AI 2.0.0-M4 (Milestone — not yet GA at time of writing)
Groq API with llama-3.3-70b-versatile (fast, free tier available)
OpenAI API with gpt-4o
Maven

We’ll build five REST endpoints, each demonstrating a different ChatClient capability:

GET /api/travel/recommend   → simple text recommendation
GET /api/travel/itinerary   → structured JSON itinerary
GET /api/travel/stream      → streaming travel story
GET /api/travel/chat        → conversation with memory
GET /api/travel/compare     → Groq vs OpenAI side-by-side

Maven Dependencies

<properties>
    <java.version>21</java.version>
    <spring-ai.version>2.0.0-M4</spring-ai.version>
</properties>

<dependencies>
    <!-- Standard REST API -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webmvc</artifactId>
    </dependency>

    <!--
      WebFlux is required alongside WebMVC for streaming.
      Spring AI's Flux<String> needs the reactive stack at runtime,
      even in a servlet-based application. Both stacks can coexist.
    -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webflux</artifactId>
    </dependency>

    <!--
      One starter covers both OpenAI and Groq,
      because Groq exposes an OpenAI-compatible API.
    -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-openai</artifactId>
    </dependency>

    <!--
      Without this, @Validated and @NotBlank are silently ignored.
      No error, no validation — just no enforcement.
    -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-validation</artifactId>
    </dependency>
</dependencies>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

The Spring AI BOM manages all Spring AI artifact versions in one place, the same way Spring Cloud BOM works. You never specify individual Spring AI dependency versions.

Configuration

spring:
  application:
    name: travel
  ai:
    chat:
      client:
        # Disabling this is required when you want multiple ChatClient beans.
        # By default Spring AI auto-configures one ChatClient.Builder.
        # If you leave it enabled and define your own beans, Spring throws
        # "No qualifying bean of type ChatClient$Builder available" at startup.
        enabled: false
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o

# Custom properties MUST live at the root level.
# Nesting them under spring.ai.groq causes
# "Could not resolve placeholder 'groq.api-key'" at startup.
groq:
  api-key: ${GROQ_API_KEY}
  base-url: https://api.groq.com/openai
  model: llama-3.3-70b-versatile

Set your API keys as environment variables before running:

export OPENAI_API_KEY=your_openai_key
export GROQ_API_KEY=your_groq_key

Project Structure — Package by Feature

com.smartgnt.travel/
├── common/          ← TravelRequest (shared by all features)
├── config/          ← ChatClientConfig (all ChatClient beans)
├── recommend/       ← RecommendService + RecommendController
├── itinerary/       ← ItineraryService + ItineraryController + records
├── stream/          ← StreamService + StreamController
├── chat/            ← ChatService + ChatController
├── compare/         ← CompareService + CompareController
└── TravelApplication.java

We organize by feature rather than by layer. Everything related to /itinerary lives in the itinerary/ package. When you need to understand or modify a feature, you open one folder, not three.

Configuring Multiple AI Models

The first real challenge with Spring AI is configuring multiple models in the same application. Our app uses both Groq and OpenAI, and the user can choose which one to call at runtime.

The key insight: Groq exposes an OpenAI-compatible API. So instead of adding a separate Groq starter, we reuse spring-ai-starter-model-openai and point it at Groq’s endpoint.

Spring AI provides a mutate() method on its model and API objects. It creates a copy of the auto-configured base object with only the properties you specify changed. Everything else — timeouts, retry logic, SSL configuration — is inherited.

@Configuration
public class ChatClientConfig {

    @Bean
    ChatClient groqChatClient(OpenAiChatModel baseChatModel,
                              OpenAiApi baseOpenAiApi,
                              @Value("${groq.api-key}") String apiKey,
                              @Value("${groq.base-url}") String baseUrl,
                              @Value("${groq.model}") String model) {

        // Start from the auto-configured OpenAI API and override only
        // the base URL and API key. Everything else stays the same.
        OpenAiApi groqApi = baseOpenAiApi.mutate()
                .baseUrl(baseUrl)
                .apiKey(apiKey)
                .build();

        OpenAiChatModel groqModel = baseChatModel.mutate()
                .openAiApi(groqApi)
                .defaultOptions(OpenAiChatOptions.builder().model(model).build())
                .build();

        return ChatClient.create(groqModel);
    }

    @Bean
    ChatClient openAiChatClient(OpenAiChatModel chatModel) {
        // OpenAI is already fully auto-configured — just wrap it in a ChatClient
        return ChatClient.create(chatModel);
    }
}

The bean method names groqChatClient and openAiChatClient matter. Spring uses them to resolve injection by name. When a service declares @Qualifier("groqChatClient"), Spring finds the bean whose method name matches.

The Shared Request Object

All five endpoints accept the same parameters: destination, budget, days, and an optional model selector. We define this once in common/ and reuse it everywhere.

// common/TravelRequest.java
public record TravelRequest(
        @NotBlank(message = "Destination required") String destination,
        @Min(value = 0, message = "Budget must be positive") int budget,
        @Min(value = 1, message = "Days must be at least 1") int days,
        String model) {
}

Java 16 — Records: A record is a concise, immutable data carrier. The compiler automatically generates the canonical constructor, accessors (use destination(), not getDestination()), equals(), hashCode(), and toString(). Perfect for objects that only hold data.

Let’s start with the simplest case: send a prompt, get a string back.

@Service
public class RecommendService {

    private final ChatClient openAiChatClient;
    private final ChatClient groqChatClient;

    public RecommendService(@Qualifier("groqChatClient") ChatClient groqChatClient,
                            @Qualifier("openAiChatClient") ChatClient openAiChatClient) {
        this.groqChatClient = groqChatClient;
        this.openAiChatClient = openAiChatClient;
    }

    public String ask(TravelRequest request) {
        // Select model at runtime based on request parameter
        ChatClient chat = "openai".equals(request.model())
                ? openAiChatClient
                : groqChatClient;

        return chat.prompt()
                .user(u -> u.text("""
                        You are a travel expert. Give a brief recommendation for a trip to {destination}
                        for {days} days with a budget of {budget} euros.
                        Include best time to visit, top 3 activities, and accommodation tips.
                        """)
                        .param("destination", request.destination())
                        .param("days", request.days())
                        .param("budget", request.budget()))
                .call()
                .content();
    }
}

The {destination}, {days}, {budget} placeholders in the prompt are resolved at runtime by .param(). Spring AI uses StringTemplate under the hood — no string concatenation, no injection risk.

Java 15 — Text Blocks: The """...""" syntax creates a multi-line string without escape sequences or concatenation. The opening """ must be followed immediately by a newline.

Now the controller:

@RestController
public class RecommendController {

    private final RecommendService recommendService;

    public RecommendController(RecommendService recommendService) {
        this.recommendService = recommendService;
    }

    @GetMapping("/api/travel/recommend")
    public String generation(@Validated @ModelAttribute TravelRequest request) {
        return recommendService.ask(request);
    }
}

Two annotations worth understanding:

@ModelAttribute tells Spring to bind query parameters to the record. Without it, Spring can’t map ?destination=Paris&days=5&budget=1000 to a TravelRequest and the LLM receives empty values with no error.

@Validated activates Bean Validation on the record fields. It requires spring-boot-starter-validation in your pom.xml. Without that dependency, the annotation is present but does nothing.

curl "http://localhost:8080/api/travel/recommend?destination=Paris&days=5&budget=1000&model=groq"

Feature 2: Structured JSON Output

The previous endpoint returns raw text. What if you need a structured response you can process in code — a proper Java object with nested fields?

Spring AI’s .entity() method handles this. It generates a JSON schema from your class, appends it to the prompt, and deserializes the LLM’s JSON response into your Java object automatically.

First, define the output structure using nested records:

public record TimeSlot(String time, String activity, String description) {}

public record Day(Integer day, List<TimeSlot> timeSlots) {}

public record Itinerary(String destination, Integer numberOfDays, List<Day> days) {}

Three levels of nesting. Spring AI handles all of it.

@Service
public class ItineraryService {

    // ... constructor with @Qualifier injection ...

    public Itinerary ask(TravelRequest request) {
        ChatClient chat = "openai".equals(request.model())
                ? openAiChatClient : groqChatClient;

        return chat.prompt()
                .user(u -> u.text("""
                        You are a travel expert. Generate a detailed day-by-day itinerary
                        for a trip to {destination} for {days} days with a budget of {budget} euros.

                        For each day, provide a list of time slots with a specific time in HH:mm format,
                        the activity name, and a short description.

                        Return a structured response with the destination, number of days, and list of days.
                        """)
                        .param("destination", request.destination())
                        .param("days", request.days())
                        .param("budget", request.budget()))
                .call()
                .entity(Itinerary.class); // ← Spring AI generates JSON schema and deserializes
    }
}

For generic types like lists, use ParameterizedTypeReference:

.entity(new ParameterizedTypeReference<List<Itinerary>>() {})

One observation worth noting: LLMs don’t always match your field casing exactly. Groq returned timeslots (all lowercase) instead of timeSlots (camelCase). Jackson deserialized it correctly because it’s case-insensitive for deserialization by default. Something to be aware of when debugging unexpected null fields.

curl "http://localhost:8080/api/travel/itinerary?destination=Paris&days=3&budget=800&model=groq"

Feature 3: Streaming with Server-Sent Events

For a 3-day itinerary, waiting 5 seconds for a synchronous response is acceptable. For a long narrative travel story, it’s not. The user stares at a blank screen while the model generates 500 words.

Streaming solves this. Tokens flow to the client as they’re generated.

@Service
public class StreamService {

    private final ChatClient groqChatClient;

    public StreamService(@Qualifier("groqChatClient") ChatClient groqChatClient) {
        this.groqChatClient = groqChatClient;
    }

    public Flux<String> ask(TravelRequest request) {
        return groqChatClient.prompt()
                .user(u -> u.text("""
                        You are a travel writer. Write a vivid travel story for a {days}-day trip
                        to {destination} with a budget of {budget} euros.
                        Describe each day like a journal entry.
                        """)
                        .param("destination", request.destination())
                        .param("days", request.days())
                        .param("budget", request.budget()))
                .stream()    // ← switch from call() to stream()
                .content();  // ← returns Flux<String> instead of String
    }
}

Reactor basics: Mono<T> is 0 or 1 value (like CompletableFuture). Flux<T> is 0 to N values over time. The LLM sends tokens progressively — Flux<String> models that stream directly.

The controller needs one critical annotation:

@RestController
public class StreamController {

    private final StreamService streamService;

    public StreamController(StreamService streamService) {
        this.streamService = streamService;
    }

    // produces = TEXT_EVENT_STREAM_VALUE is not optional.
    // Without it, the response Content-Type is text/plain.
    // The browser's native EventSource API requires text/event-stream
    // and will silently reject the stream without it.
    @GetMapping(value = "/api/travel/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> generation(@Validated @ModelAttribute TravelRequest request) {
        return streamService.ask(request);
    }
}

Test it in Postman — you’ll see tokens arriving progressively instead of waiting for a complete response.

curl -H "Accept: text/event-stream" \
  "http://localhost:8080/api/travel/stream?destination=Rome&days=4&budget=1200&model=groq"

Spring AI streaming SSE architecture showing Flux tokens from LLM to client

Feature 4: Conversation Memory

Here’s the fundamental problem with LLMs in a REST API: every HTTP request is independent. The LLM doesn’t remember what was said in previous requests. Tell it “my name is John” in request 1, ask “what’s my name?” in request 2 — it won’t know.

The solution is straightforward: include the conversation history in every request. Spring AI automates this entirely with ChatMemory.

Why Memory Scope Matters

The tempting approach is to add MessageChatMemoryAdvisor to the shared groqChatClient bean in ChatClientConfig. One line, every service gets memory.

But that would mean the /recommend endpoint starts accumulating history. A recommendation for Paris would get injected into the next Tokyo itinerary request. Conversations would grow unbounded. Behavior across unrelated features would become entangled.

The right approach is to scope the memory exactly where it’s needed, using mutate():

@Service
public class ChatService {

    private final ChatClient chatClientWithMemory;

    public ChatService(@Qualifier("groqChatClient") ChatClient groqChatClient) {
        ChatMemory chatMemory = MessageWindowChatMemory.builder().build();

        // mutate() creates a new ChatClient with the advisor added.
        // The shared groqChatClient bean is NOT modified.
        // Memory exists only inside ChatService.
        this.chatClientWithMemory = groqChatClient.mutate()
                .defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory).build())
                .build();
    }

    public String ask(String input, String conversationId) {
        return chatClientWithMemory.prompt()
                .user(input)
                // Pass the conversation ID to the advisor so it knows
                // which conversation history to retrieve and inject
                .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId))
                .call()
                .content();
    }
}

MessageWindowChatMemory keeps the last 20 messages per conversationId. It stores them in RAM (InMemoryChatMemoryRepository) — history is lost on restart. For production, switch to JdbcChatMemoryRepository.

The conversationId is a string you generate client-side. A UUID works fine. Without passing one, all requests share DEFAULT_CONVERSATION_ID = "default", meaning every user’s conversation is mixed together.

@RestController
public class ChatController {

    private final ChatService chatService;

    public ChatController(ChatService chatService) {
        this.chatService = chatService;
    }

    @GetMapping("/api/travel/chat")
    public String generation(@RequestParam String input,
                             @RequestParam String conversationId) {
        return chatService.ask(input, conversationId);
    }
}

Test memory isolation with two conversation IDs:

# Session 1 — introduce yourself
curl "http://localhost:8080/api/travel/chat?input=My+name+is+John&conversationId=session-1"

# Session 1 — verify memory
curl "http://localhost:8080/api/travel/chat?input=What+is+my+name?&conversationId=session-1"
# → "Your name is John."

# Session 2 — completely separate history
curl "http://localhost:8080/api/travel/chat?input=What+is+my+name?&conversationId=session-2"
# → "I don't know your name..."

Spring AI chat memory conversation isolation with conversationId and mutate pattern

Feature 5: Parallel Model Comparison

The last endpoint sends the same question to both Groq and OpenAI simultaneously and returns both responses. This is useful to compare quality, style, and speed between models.

The naive sequential approach:

// Avoid this — total time = groq time + openai time (~6.5 seconds)
String groq = groqChatClient.prompt(input).call().content();
String openAi = openAiChatClient.prompt(input).call().content();

Measured at warm state: approximately 6.5 seconds. The right approach — Virtual Threads (Java 21):

@Service
public class CompareService {

    private final ChatClient groqChatClient;
    private final ChatClient openAiChatClient;

    // ... constructor ...

    public Map<String, String> ask(String input) throws ExecutionException, InterruptedException {
        // try-with-resources closes the executor automatically.
        // Without this, the executor leaks and virtual threads stay alive indefinitely.
        try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {

            var groqFuture = executor.submit(
                    () -> groqChatClient.prompt(input).call().content()
            );
            var openAiFuture = executor.submit(
                    () -> openAiChatClient.prompt(input).call().content()
            );

            return Map.of(
                    "groq", groqFuture.get(),
                    "openai", openAiFuture.get()
            );
        }
    }
}

Measured result: approximately 3.5 seconds — roughly 2× faster, because both calls happen simultaneously.

Java 21 — Virtual Threads: Virtual threads are lightweight, JVM-managed threads. When one blocks waiting for an HTTP response, it unmounts from the underlying OS thread, freeing it for other work. You can run thousands of concurrent blocking I/O operations with a fraction of the OS threads you’d otherwise need. For LLM API calls — which are pure network I/O — this is the modern default.

Java 7 — try-with-resources: Works with any AutoCloseable. ExecutorService became AutoCloseable in Java 19, so executor.close() is called automatically when the block exits, even on exception.

curl "http://localhost:8080/api/travel/compare?input=Best+time+to+visit+Tokyo"

Groq vs OpenAI parallel comparison using Spring AI ChatClient and Java 21 Virtual Threads

Pitfalls Worth Knowing Before You Ship

Building this project surfaced a set of errors that aren’t obvious from the documentation. Here are the ones worth your attention:

spring.ai.chat.client.enabled=false breaks ChatClient.Builder injection
If any class injects ChatClient.Builder directly and you set this property, Spring throws No qualifying bean of type 'ChatClient$Builder' available at startup. Inject OpenAiChatModel instead and build ChatClient manually.

mutate() is the correct builder pattern in Spring AI 2.x
new OpenAiChatModel(api, options) and new OpenAiApi(baseUrl, apiKey) don’t compile. Direct constructors were removed. Use OpenAiApi.builder()...build() and OpenAiChatModel.builder()...build().

Custom properties nested under spring.ai fail silently at startup
spring.ai.groq.api-key causes Could not resolve placeholder. Custom properties belong at the root of application.yml.

@ModelAttribute is not optional on GET endpoints with record parameters
Without it, Spring doesn’t bind query string parameters to the record. The prompt runs with null/zero values and the LLM returns generic output. No exception is thrown.

produces = MediaType.TEXT_EVENT_STREAM_VALUE is required for SSE
Without it, the response Content-Type is text/plain. Postman works anyway (it’s permissive), but the browser’s native EventSource API won’t. Set it on every streaming endpoint.

Memory scope must be intentional
Adding MessageChatMemoryAdvisor to a shared ChatClient bean affects every service that uses it. Use mutate() to scope memory to the service that needs it.

Running the Project

Clone the repository and run:

git clone https://github.com/samedsam/spring-ai-travel-assistant
cd spring-ai-travel-assistant
export GROQ_API_KEY=your_groq_key
export OPENAI_API_KEY=your_openai_key
mvn spring-boot:run

All five endpoints are immediately available. Start with /recommend for the simplest test, then work your way through /itinerary, /stream, /chat, and /compare.

What’s Next

This is Part 1 of the Spring AI series on smartgnt.com.

Part 2 — RAG (Retrieval Augmented Generation): Feed the LLM a PDF document and let it answer questions about the content. Spring AI handles embedding, vector storage, and document retrieval. The model responds based on your data, not its training data.

Part 3 — Tool Calling: Let the LLM decide when to invoke Java methods. Annotate a method with @Tool, and the model calls it dynamically when it determines it needs that information. Context-aware behavior without hardcoded conditional logic.

Frequently Asked Questions

What is Spring AI ChatClient?

ChatClient is Spring AI’s fluent API for communicating with LLMs. It provides the same builder pattern as WebClient and RestClient, supporting both synchronous responses with .call() and reactive streaming with .stream(). It abstracts away the provider so the same code works with OpenAI, Groq, Anthropic, or any supported model.

How do I configure multiple models in Spring AI?

Set spring.ai.chat.client.enabled=false to disable the auto-configured ChatClient.Builder. Define separate @Bean methods for each model. Use mutate() on OpenAiApi and OpenAiChatModel to create model variants without rebuilding from scratch. Inject with @Qualifier.

How does conversation memory work in Spring AI?

MessageWindowChatMemory stores the last N messages (default 20) per conversation. Attach it to a ChatClient using MessageChatMemoryAdvisor. Send a unique conversationId with each request via .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId)). Spring AI retrieves and prepends the history to every prompt automatically.

Why do I need WebFlux in a WebMVC application?

Spring AI uses Flux<String> from Project Reactor for streaming. Reactor requires the reactive stack (Netty) at runtime. WebMVC and WebFlux can coexist in the same application — add spring-boot-starter-webflux alongside spring-boot-starter-webmvc.

What is the difference between call() and stream() in Spring AI?

call() waits for the complete LLM response before returning. stream() returns a Flux<String> where tokens arrive as they are generated. Use stream() when you want real-time output visible to the user. Use call() when you need the full response for processing, storage, or structured deserialization with .entity().

Found this Spring AI ChatClient tutorial useful? Follow SmartGNT for Part 2 (RAG) and Part 3 (Tool Calling). Questions or issues? Drop them in the comments below.