Another Crazy Day in AI: How to Know If Your AI Model Will Succeed — or Fail
- Wowza Team
- May 13
- 4 min read

Hello, AI Enthusiasts.
Made it through the day? Here’s a little something to wind down with that isn’t more email replies.
Microsoft researchers say it’s not enough to know how AI performs—we need to know why. Their new system aims to predict AI behavior across tasks and make evaluation more human-readable.
Meanwhile, Cybersecurity just got bumped as the #1 tech budget item. Generative AI has officially taken the wheel.
And while we’re on the topic of who’s in the driver’s seat… economists are calling out the U.S. auto industry for stalling where it should’ve sped up.
That’s the latest. You’re (almost) up to speed.
Here's another crazy day in AI:
Microsoft advances AI evaluation methods
Generative AI now a bigger budget item than cybersecurity
The economics behind America’s auto market
Some AI tools to try out
TODAY'S FEATURED ITEM: Predicting AI Performance Before Deployment

Image Credit: Wowza (created with Ideogram)
What if we could predict AI success before testing?
As AI systems take on increasingly complex and critical roles, simply knowing whether a model performs well is no longer enough. We need to understand why it performs the way it does — and anticipate how it might behave on new, unfamiliar tasks.
In a recent post on the Microsoft Research Blog, Lexin Zhou and Xing Xie share insights from their groundbreaking study, “General Scales Unlock AI Evaluation with Explanatory and Predictive Power.” Supported by Microsoft’s Accelerating Foundation Models Research (AFMR) program, this research proposes a new way to evaluate AI models that goes beyond surface-level scores — aiming instead to predict future performance and explain results in human terms. The full study is co-authored by a broad team of researchers across Microsoft, Cambridge, and international institutions.
Instead of relying on traditional benchmarks, the team created ADeLe — an ability-based evaluation system that assesses what a task demands cognitively and compares it to a model’s skillset.
A closer look at what the study introduces:
A framework grounded in 18 cognitive and knowledge-based scales, adapted from human-centered assessment practices
A method for rating task difficulty and linking it to a model’s capabilities through structured evaluation
Profiles of AI model “abilities” that help explain how — and why — certain systems perform better on specific tasks
An evaluation of 15 large language models, highlighting patterns in reasoning, abstraction, and subject knowledge
Findings that challenge the completeness of some popular benchmarks, many of which test only narrow difficulty ranges or mix in unrelated demands
A predictive system that forecasts model performance on new tasks with around 88% accuracy
The ADeLe framework brings something new to the table: a way to make sense of AI performance in terms we can interpret. Rather than averaging a model’s score across a benchmark and calling it done, this approach builds a fuller picture — one that recognizes the range of skills a task might require and matches them with what the model is actually equipped to do. That context matters, especially when these systems are being used in more complex or consequential environments.
The team’s work doesn’t aim to replace traditional benchmarks, but to strengthen how we understand them. In doing so, it opens up new possibilities for more thoughtful evaluation — not just comparing models against one another, but assessing whether any given model is suited for a specific use case. As AI systems continue to scale and diversify, that kind of clarity could help researchers and practitioners make better decisions about development, safety, and deployment.
Read the full blog here.
Read the full paper here.
OTHER INTERESTING AI HIGHLIGHTS:
Generative AI Now a Bigger Budget Item Than Cybersecurity
/John K. Waters, on Campus Technology
A new global survey by AWS finds that generative AI has leapfrogged cybersecurity in 2025 tech budget priorities. With 90% of organizations now exploring or deploying generative AI, the technology is moving from experimental to essential. While security still plays a vital role in AI governance, the shift highlights how companies are racing to integrate generative tools into core workflows. Notably, nearly half of respondents are already using generative AI in production environments.
Read more here.
The Economics Behind America’s Auto Market
/Andrey Fradkin and Seth Benzell, on Substack at Empiricrafting
Is the U.S. auto industry serving consumer welfare — or corporate margins? In this podcast episode of Justified Posteriors, economists Seth Benzell and Andrey Fradkin dig into new research analyzing decades of pricing, competition, and innovation in the American auto sector. Their conversation raises questions about market power, pricing strategies, and whether more competition could have accelerated progress. It’s a data-driven reflection on how structure shapes outcomes in major industries.
Read more here.
SOME AI TOOLS TO TRY OUT:
DeckSpeed – Create personalized slides from conversations—no templates needed.
rehearsal.so – Practice real conversations with AI and compete with friends.
FirstQuadrant – Automates your sales process, from follow-ups to closing.
That’s a wrap on today’s Almost Daily craziness.
Catch us almost every day—almost! 😉
EXCITING NEWS:
The Another Crazy Day in AI newsletter is on LinkedIn!!!

Leveraging AI for Enhanced Content: As part of our commitment to exploring new technologies, we used AI to help curate and refine our newsletters. This enriches our content and keeps us at the forefront of digital innovation, ensuring you stay informed with the latest trends and developments.
Kommentit