Spring and Spring Boot are both part of the larger Spring ecosystem, but they serve different purposes and offer different features. Here are the key differences between them:

1. Purpose and Focus
  • Spring Framework: The core Spring Framework provides comprehensive infrastructure support for developing Java applications. It focuses on providing a wide range of functionalities, such as dependency injection, aspect-oriented programming, transaction management, and more. It is modular, meaning you can use only the parts you need for your application.
  • Spring Boot: Spring Boot is built on top of the Spring Framework and is designed to simplify the process of creating stand-alone, production-grade Spring applications. It aims to minimize configuration and setup time by offering default configurations and embedded servers.

2. Configuration
  • Spring Framework: Requires extensive configuration, usually involving XML or Java-based configuration. Developers need to manually define beans and configure application settings.
  • Spring Boot: Reduces the need for manual configuration through auto-configuration and convention over configuration. It uses sensible defaults and annotations to automatically configure the application based on the dependencies present in the classpath.

3. Setup and Initialization
  • Spring Framework: Setting up a Spring application involves creating and configuring a lot of boilerplate code and configuration files. You need to manually set up the application context and configure dependencies.
  • Spring Boot: Simplifies the setup process by providing starter dependencies (starter POMs) and a simplified project structure. It also includes embedded servers, so you can run your application as a stand-alone Java application.

4. Embedded Servers
  • Spring Framework: Typically requires an external application server (like Tomcat, Jetty, or JBoss) to run the application. Developers need to package and deploy their application to the server.
  • Spring Boot: Comes with embedded servers (Tomcat, Jetty, or Undertow), allowing you to run your application directly from the command line without needing to deploy it to an external server. This makes development, testing, and deployment easier and faster.

5. Production-ready Features
  • Spring Framework: Does not include built-in production-ready features. Developers need to add and configure additional tools and libraries for monitoring, health checks, and metrics.
  • Spring Boot: Provides built-in production-ready features, including health checks, metrics, application monitoring, and logging. These features are available out-of-the-box and require minimal configuration.

    In summary, while the core Spring Framework provides the foundational tools and infrastructure for building applications, Spring Boot streamlines the process, offering default configurations and embedded servers to create stand-alone, production-ready applications quickly and easily.
RAG with Spring AI: Make the Model Answer From Your Own Data — Hungry Coders

RAG with Spring AI: Make the Model Answer From Your Own Data

In the first post we got a ChatClient talking to a model in a few lines. But that model only knows what it was trained on. It has never seen your product docs, your internal wiki, or last week's incident reports. Ask it about any of that and it will either say "I don't know" or — worse — confidently make something up.

RAG fixes this. And in Spring AI, it's not a new framework to learn. It's one advisor you attach to the same ChatClient you already built.

The one idea — RAG = "look it up, then answer." You retrieve the relevant chunks of your data, paste them into the prompt as context, and let the model answer from that. Spring AI's QuestionAnswerAdvisor does the retrieve-and-paste step for you.

This post builds directly on the first one. Same Java 21+, same Spring AI 1.1.x. By the end you'll have an endpoint that answers questions using documents you loaded — and you'll understand the three moving parts well enough to swap any of them.

Why the model needs your data pasted in

An LLM is frozen at training time. It has no live access to your database and no memory of your business. There are only two ways to give it knowledge it doesn't have: fine-tune it (expensive, slow, and stale the moment your data changes), or hand it the relevant facts at question time inside the prompt.

RAG is the second option, done well. The trick is the "relevant" part — you can't paste your entire wiki into every prompt; it won't fit, and it would cost a fortune in tokens. So you store your data as embeddings (vectors that capture meaning), and at question time you pull back only the handful of chunks most similar to what the user asked. That's a similarity search, and it's the whole game.

The three moving parts

Every RAG setup in Spring AI is the same three pieces. Learn these names and the rest is wiring:

  • EmbeddingModel — turns text into vectors. Auto-configured by your starter, just like the chat model.
  • VectorStore — stores those vectors and runs similarity search. Can be an in-memory store for demos or a real database (PGVector, Redis, Pinecone) in production.
  • QuestionAnswerAdvisor — the glue. It intercepts the prompt, searches the VectorStore, and appends the results as context before the model sees it.

Notice what's not on that list: any change to your controller's mental model. You're still calling chatClient.prompt().user(...).call(). You're just adding an advisor.

Step 1: Add the vector-store advisor dependency

The advisor lives in its own module. Add it alongside the OpenAI starter from the first post:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>

For Gradle:

implementation "org.springframework.ai:spring-ai-advisors-vector-store"

The BOM you already imported manages the version, so there's no version number here. That's the whole point of the BOM.

Step 2: A vector store you don't have to install

You don't need to stand up a database to learn RAG. Spring AI ships SimpleVectorStore — an in-memory store that's perfect for a first pass. Declare it as a bean and hand it the auto-configured EmbeddingModel:

@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel) {
    return SimpleVectorStore.builder(embeddingModel).build();
}
Dev vs prodSimpleVectorStore keeps everything in memory and disappears on restart. That's fine for learning and demos. For production you swap this one bean for PGVector, Redis, or another store — and because everything downstream talks to the VectorStore interface, nothing else in your code changes. Same portability lesson as swapping chat providers in the last post.

Step 3: Load some documents

Before the model can answer from your data, the data has to be in the store. In real life you'd read PDFs or pull from a database; here we'll add a few documents by hand so the flow is clear. Spring AI splits, embeds, and stores them for you:

@RestController
public class KnowledgeController {

    private final VectorStore vectorStore;

    public KnowledgeController(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    @PostMapping("/load")
    public String load() {
        var docs = List.of(
            new Document("Our refund window is 14 days from purchase."),
            new Document("Enterprise plans include 24/7 priority support."),
            new Document("API rate limit on the free tier is 60 requests per minute.")
        );
        vectorStore.add(docs);
        return "Loaded " + docs.size() + " documents";
    }
}

Call POST /load once. Behind that single vectorStore.add(...) call, each document is run through the embedding model and stored as a vector. You never touch the math.

Step 4: Wire the advisor and ask a question

Now the payoff. Attach a QuestionAnswerAdvisor to the ChatClient and ask something only your documents know:

@RestController
public class RagController {

    private final ChatClient chatClient;

    public RagController(ChatClient.Builder builder, VectorStore vectorStore) {
        this.chatClient = builder
                .defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore).build())
                .build();
    }

    @GetMapping("/ask")
    public String ask(@RequestParam String question) {
        return chatClient.prompt()
                .user(question)
                .call()
                .content();
    }
}

Hit /ask?question=How long do I have to get a refund? and the model answers "14 days" — not because it was trained on your policy, but because the advisor found that document, pasted it into the prompt, and the model read it. Ask something your documents don't cover and a well-configured RAG setup will tell you it doesn't know, instead of inventing an answer.

Read the controller again. It's the exact shape from the first post. The only new line is .defaultAdvisors(...). That's RAG.

Tuning what gets retrieved

The default advisor searches every document and pulls the closest matches. In a real corpus you'll want more control — how many chunks, and how similar they must be to count. The builder takes a SearchRequest for exactly this:

QuestionAnswerAdvisor.builder(vectorStore)
    .searchRequest(SearchRequest.builder()
            .topK(4)
            .similarityThreshold(0.75)
            .build())
    .build();

topK caps how many chunks get pasted in (more context costs more tokens and can dilute the answer). similarityThreshold filters out weak matches so junk doesn't end up in your prompt. These two knobs fix most "the answers are vague" complaints.

If RAG answers feel off, it's almost always retrieval, not the model. Check what chunks came back before you blame the LLM.

When to graduate to the production pipeline

QuestionAnswerAdvisor is the zero-config, single-store path — ideal for getting RAG working. When your pipeline needs more — metadata filtering, joining results from multiple stores, rewriting the query before retrieval — Spring AI offers RetrievalAugmentationAdvisor (in the spring-ai-rag module), a composable pipeline you assemble from modular pieces.

Don't reach for it on day one. Start with QuestionAnswerAdvisor, ship something that works, and move up only when a real requirement pushes you there. That's the same discipline you'd apply to any abstraction in Spring.

Where this fits in the series

This is post two in our Spring AI track, and it builds on the same ChatClient foundation:

RAG is the feature that turns "a chatbot that sounds smart" into "a system that knows your business." It's also the one most teams want first. Get this pattern solid and you've covered the majority of real-world AI backend work.


Go Deeper

Build production RAG, not toy demos

Document ingestion, chunking strategies, real vector databases, retrieval tuning, and evaluation — built specifically for Java & Spring Boot engineers, not Python tutorials rewritten in Java. No hype, just depth.

Explore the AI Course →
Created with