Another Crazy Day in AI: How to Teach Models to Say “I Don’t Know”

Wowza Team
May 14, 2025
4 min read

Hello, AI Enthusiasts.

If your brain's a bit fried by now, Google Research just gave us something fresh to chew on. They’re rethinking RAG systems—not just for relevance but for sufficiency. The takeaway? It’s not enough for AI to find facts. It needs enough of the right ones to actually answer (or know when not to).

Meanwhile, arketing pros are hitting the brakes on over-AI’d content. And if you’re on TikTok, your still photos are about to start moving. Literally.

Wednesdays used to be slow. Not anymore.

Here's another crazy day in AI:

Why RAG systems still hallucinate
Avoiding the AI marketing trap
TikTok introduces AI Alive to animate your photos
Some AI tools to try out

TODAY'S FEATURED ITEM: When AI Should Say Nothing

A robotic scientist in a classic white coat with 'AI Scientist' on its back stands beside a human scientist with 'Human Scientist' on their coat, looking towards the AI Scientist.

Image Credit: Wowza (created with Ideogram)

What if your AI model could tell you when it shouldn’t answer?

Researchers Cyrus Rashtchian and Da-Cheng Juan propose a new way to evaluate retrieval-augmented generation (RAG) systems—not just by how relevant the retrieved context is, but by whether it’s sufficient. Their paper, “Sufficient Context: A New Lens on Retrieval Augmented Generation Systems,” was presented at ICLR 2025 and dives into why RAG models hallucinate, how to classify when context is actually enough to answer a question, and what we can do to reduce false outputs.

They introduce the concept of “sufficient context”—the idea that the retrieved information must include everything needed to produce a correct answer. Using a new automatic rating method and a selective generation approach, they examine when it might be better for a model to skip answering altogether. Interestingly, their work shows that providing more context can sometimes make things worse—leading to hallucinations when the extra information isn’t quite right.

What they discovered about how RAG systems really behave:

Relevance ≠ Sufficiency: Context may be on-topic but still not have the information needed to answer a question accurately.
Introducing the “autorater”: An LLM-based tool classifies context as sufficient or insufficient with 93%+ accuracy—no ground truth answers needed.
Top models still struggle: Even the best models (Gemini, GPT, Claude) tend to hallucinate when context is insufficient, instead of just saying “I don’t know.”
Smaller models hallucinate more: Open-source models often fail even when context is technically sufficient.
More context, more confidence... more hallucinations: Adding context increases the risk of confident, wrong answers—especially in models like Gemma.
Selective generation helps: Combining model confidence with sufficiency ratings reduces hallucinations without sacrificing too many correct answers.
Dataset matters: Datasets like FreshQA, with human-curated supporting docs, provide more sufficient context than others like HotPotQA.

The implications are far-reaching for anyone working with RAG pipelines or trying to improve trust in AI-generated responses. Instead of focusing only on document relevance or retrieval hit rates, the paper encourages a closer look at whether the input truly supports a reliable answer. This shift in evaluation could influence how future systems are trained, optimized, and judged in real-world use.

For builders, this opens up a practical takeaway: sometimes, less is more. Knowing when not to answer—or when the retrieved context doesn’t cut it—might be just as important as knowing the right thing to say. As more teams deploy AI assistants into high-stakes settings, understanding and applying the idea of “sufficient context” could be key to reducing costly errors and improving user trust.

Read the full blog here.

Read the full paper here.

OTHER INTERESTING AI HIGHLIGHTS:

Avoiding the AI Marketing Trap

/Olivia Bunescu, Senior Associate Editor, on Multi-Housing News

As AI takes center stage in modern marketing strategies, experts warn against over-reliance. While AI tools streamline workflows and spark ideas, they can’t replace human intuition, emotional intelligence, or strategic thinking. From mishandling negative reviews to generating tone-deaf content, the risks of letting AI steer your messaging are real. Marketers are encouraged to treat AI as a creative collaborator—not the lead driver.

Another Crazy Day in AI: How to Teach Models to Say “I Don’t Know”

Recent Posts

Comments

Subscribe to Another Crazy Day in AI

Comments

Subscribe to Another Crazy Day in AI​

Subscribe to Another Crazy Day in AI