Pinch me if you’ve heard a requirement like this before:

“We want our bot to answer questions like a human, but only use the exact wordings from the knowledge base.”

One of the main reasons chatbot systems adopt Retrieval-Augmented Generation (RAG) is to generate original text grounded in facts. Businesses in regulated industries like healthcare or finance, however, are understandably wary of hallucinations or poorly-worded answers. In such fields, even minor inaccuracies can lead to legal or reputational risks, such as being misconstrued as offering medical diagnoses or financial advice.

Luckily, this is not a complete impass. The amount of creativity given to AI in a chatbot is not all-or-nothing; in fact, there are ways for chatbots to get the benefits of a knowledge base without the drawbacks of answering with LLM-generated text.

a diagram showing the possible tactics on the creativity - control spectrum

This could have been an email

Join my mailing list to get content like this directly in your inbox

    I won't spam you, promise. Unsubscribe at any time.

    Traditional RAG: Full Creativity

    A diagram showing the traditional RAG process

    In the traditional RAG approach, documents containing relevant business information are chunked, embedded, and stored in a vector database. When a user submits a query, their message is vectorized and compared for similarity with stored chunks. The most similar ones are retrieved and sent to an LLM along with the user’s query, grounding the model’s response in real-world facts.

    Pros:

    • Quick to implement: This is the standard playbook for RAG and has abundant resources and tools available to get started.
    • Scales effortlessly: Once data is embedded, the process requires no additional manual intervention.
    • Highly conversational: LLMs craft tailored, natural-sounding responses to each user query.

    Cons:

    • Expensive and slow: The LLM will generate answers for every query, increasing response latency and operational costs.
    • Risk of irrelevant context: Retrieved chunks may include false positives, throwing off the generated response.
    • Inconsistent tone: The LLM determines tone and style, which may not align with the brand voice.

    Middle Ground: Prompt Engineering

    A diagram showing the RAG process with a different LLM prompt

    LLMs can be prompted to wrap retrieved facts with light conversational elements, while emphasizing that the facts must not be altered. This keeps responses natural while retaining control over accuracy.

    Pros:

    • Low complexity: Minimal changes to a traditional RAG setup, requiring only an adjusted prompt.
    • Balanced output: Conversational responses stay close to the facts.
    • Fast to iterate: Experimenting with different prompts is straightforward.

    Cons:

    • Latency persists: Responses still rely on LLMs for generation.
    • Tone risks: Retrieved facts might still feel out of place within the broader conversation.
    • False positives: Irrelevant retrieved chunks can still confuse the LLM.

    Middle Ground: Answers as Facts

    A diagram showing using answers as chunks for RAG

    Rather than chunking raw facts, this approach populates the knowledge base eith pre-written answers. For instance, instead of storing a raw fact like:
    > “The store is open from 10:00 AM to 7:00 PM, Monday through Friday, except on holidays.”

    The chunk might instead say:
    > “We’re open from 10 AM to 7 PM, Monday through Friday, excluding holidays.”

    Pros:

    • Pre-defined voice: Responses will be more conversational and adhere to brand guidelines.
    • Reuse existing content: FAQs and help center articles can be reused as the backbone of the knowledge base.
    • User-friendly answers: Responses feel natural and relatable to the user.

    Cons:

    • High initial effort: Writing and formatting the pre-defined answers is time-intensive.
    • Still requires LLMs: Latency and costs remain a factor if generation is still used.
    • False positives: Irrelevant retrieved chunks can still confuse the LLM, especially if answers contradict each other.

    Middle Ground: Topic Classification

    A diagram showing the RAG process with a topic classification step

    Here, the bot assesses whether a user query is a “sensitive” or “non-sensitive” query after retrieving the knowledge base chunks. Sensitive queries might bypass the LLM, with the retrieved facts sent directly to the user or minimally rephrased. For less sensitive topics, the bot can allow more creativity in the response.

    Pros:

    • Flexible control: Retains creativity for safe topics while tightly controlling sensitive ones.
    • Context-aware: Balances accuracy and conversational flow based on the query’s importance.
    • Efficient scaling: Non-sensitive queries are easier to scale with LLMs.

    Cons:

    • Setup complexity: Requires careful manual categorization of sensitive topics.
    • Tone shifts: Responses for sensitive topics may abruptly differ in style, creating a jarring user experience.
    • Diminished engagement: Sensitive answers may feel overly formal or rigid, and users can get fustrated if hand-written answers fail to address their questions.

    Full Control: Question Matching

    A diagram showing question matching with vector similarity

    Instead of embedding facts or answers, hundreds of pairs of user questions and hand-written answers are embedded and put into a vectorstore. Crucially, the hand-written answer is attached to each question’s metadata so that, when a question similar to the user’s question is found, the hand-written answer is returned to be sent to the user. The initial collection of user questions can come from existing data or generated by an LLM.

    Pros:

    • Absolute accuracy: No hallucination risk — answers will always be given as-written.
    • Preserves tone: Responses align with brand voice and guidelines.
    • Cost-effective: Fast and inexpensive, as no text generation is involved.

    Cons:

    • Resource-heavy: Requires significant manual effort to create and maintain Q&A pairs.
    • Hard to scale: Adding new queries or refining answers is time-intensive.
    • Lacks flexibility: Responses are static and might feel disconnected from the conversation context.

    Which is best?

    It depends entirely on your use case! No bot is an abolute - there will always be a mix of creativity and control in each use case.

    AI generated image of a customer support chatbot

    Support bots tend to be more controlled than creative bots

    Chatbots that provide customer support, whether it is for IT equipment or airline tickets, tend to skew more towards controlled responses. This is because users want to get the answer to their question quickly and directly, and don’t care if the bot was funny or smart. Often these bots will get the best results from wrapping existing facts in a conversational veneer before passing them to the user.

    HR, Finance, and Healthcare are very controlled

    Chatbots that serve users in these industries have to be very careful about how they answer questions because their parent companies could end up liable for any misinformation. Often these bots will have a subject matter expert like a CPA, a medical doctor, or an HR specialist write all the content for the bot. In these cases, AI is used more to direct the conversation rather than generate content.

    AI generated image of HR, Finance, and healthcare industries
    AI generated image of a sales chatbot

    Sales bots are more creative

    When chatbots are making product recommendations or trying to pitch users on a service, more creativity is beneficial. It allows them to use context from the conversation to make the pitches more relevant and personalized to the end user.

    The decision is yours

    Ultimately, your chatbot’s place on the creativity-control spectrum will depend on your goals, audience, and risk tolerance. If you need guidance in finding the best approach for your chatbot, book a free 20-minute consultation with me!

    ❤️

    Gordy