Most AI is asserted.
We make it measured.
Empirical AI is a research and consulting studio for teams building systems that reason about the world. We bring measurement, evaluation, and honest uncertainty to claims that are usually made on faith.
The approach
Why an evidence-first studio, in a field that runs on confident claims.
The hard part of AI was never building a demo. It's knowing whether the thing actually works — and being honest about where it stops.
Every model ships with a story about what it can do. Most of those stories are untested. The gap between a convincing demo and a system you can depend on is exactly the gap we work in.
We treat AI the way the rest of science treats a claim: as something to be measured, stress-tested, and reported with its error bars intact. That means designing the right experiment — not just running a benchmark — and telling you what we found, including the parts that don't flatter the model.
It's a quieter pitch than most. It also happens to be the one that holds up.
What we do
Four ways we put evidence under AI systems and the decisions around them.
System & model evaluation
Reproducible evals for the models and pipelines you're shipping. We design the measurements that match your real task — not just the leaderboard ones — and tell you where the system holds and where it breaks.
World-model research
Applied research on systems that learn the dynamics of an environment: what they internally represent, how far their predictions stay reliable, and where the picture quietly falls apart.
AI strategy, grounded
Decisions about where AI earns its place and where it doesn't — argued from your data and tested assumptions, not vendor decks. A clear read on what's worth building now.
Diligence & second opinions
Independent assessment of AI claims, vendors, and models for teams and investors who need the real picture before they commit capital or roadmap to it.
On world models
The research thesis the studio is built around.
A world model is an AI system's internal picture of how its environment behaves — the thing that lets it predict what happens next and plan against it. It's one of the most consequential ideas in AI right now, and one of the hardest to evaluate.
The reason is simple: a model can predict the next moment beautifully and still drift into nonsense a few steps out. The interesting question isn't can it predict — it's how far does the prediction hold before reality and the model part ways. That horizon is measurable. Most teams never measure it.
That's our lane: finding where a system's understanding of the world actually ends, and making that boundary legible to the people betting on it.
The studio
Empirical AI is the first brand of 37 Digital.
37 Digital is a San Francisco studio building focused digital ventures. The name is a coordinate — 37.7749° N, the latitude of the city it's built in. It's a reminder of the operating principle: start from a fixed, real point, and measure outward from there.
Empirical AI is the studio's first practice. Future work will carry its own names, but the same standard: build something real, measure whether it works, and tell the truth about what the measurement shows.
For now, one person, a clear thesis, and a strong bias toward evidence. If that's the kind of partner you've been looking for, the door's open.
Building something that reasons about the world — or trying to tell whether someone else's system really does?
Tell us what you're working on. We'll tell you, plainly, whether and how we can help.
matt@37digital.co