Empirical AI  ·  A 37 Digital studio  ·  San Francisco 37.7749° N

Most AI is asserted.
We make it measured.

Empirical AI is a research and consulting studio for teams building systems that reason about the world. We bring measurement, evaluation, and honest uncertainty to claims that are usually made on faith.

Start a conversation evidence before opinion
OBSERVED EXTRAPOLATED PREDICTION HORIZON → RELIABILITY
fig.01  — model reliability vs. prediction horizon illustrative · with honest uncertainty
§ 01

The approach

Why an evidence-first studio, in a field that runs on confident claims.

The hard part of AI was never building a demo. It's knowing whether the thing actually works — and being honest about where it stops.

Every model ships with a story about what it can do. Most of those stories are untested. The gap between a convincing demo and a system you can depend on is exactly the gap we work in.

We treat AI the way the rest of science treats a claim: as something to be measured, stress-tested, and reported with its error bars intact. That means designing the right experiment — not just running a benchmark — and telling you what we found, including the parts that don't flatter the model.

It's a quieter pitch than most. It also happens to be the one that holds up.

§ 02

What we do

Four ways we put evidence under AI systems and the decisions around them.

01 / EVALUATION

System & model evaluation

Reproducible evals for the models and pipelines you're shipping. We design the measurements that match your real task — not just the leaderboard ones — and tell you where the system holds and where it breaks.

02 / RESEARCH

World-model research

Applied research on systems that learn the dynamics of an environment: what they internally represent, how far their predictions stay reliable, and where the picture quietly falls apart.

03 / STRATEGY

AI strategy, grounded

Decisions about where AI earns its place and where it doesn't — argued from your data and tested assumptions, not vendor decks. A clear read on what's worth building now.

04 / DILIGENCE

Diligence & second opinions

Independent assessment of AI claims, vendors, and models for teams and investors who need the real picture before they commit capital or roadmap to it.

§ 03

On world models

The research thesis the studio is built around.

A world model is an AI system's internal picture of how its environment behaves — the thing that lets it predict what happens next and plan against it. It's one of the most consequential ideas in AI right now, and one of the hardest to evaluate.

The reason is simple: a model can predict the next moment beautifully and still drift into nonsense a few steps out. The interesting question isn't can it predict — it's how far does the prediction hold before reality and the model part ways. That horizon is measurable. Most teams never measure it.

That's our lane: finding where a system's understanding of the world actually ends, and making that boundary legible to the people betting on it.

HORIZON → model ground truth
where the model and the world diverge
§ 04

The studio

Empirical AI is the first brand of 37 Digital.

37 °N
37 Digital
San Francisco
Latitude37.7749° N
Practice 01Empirical AI
Est.2026

37 Digital is a San Francisco studio building focused digital ventures. The name is a coordinate — 37.7749° N, the latitude of the city it's built in. It's a reminder of the operating principle: start from a fixed, real point, and measure outward from there.

Empirical AI is the studio's first practice. Future work will carry its own names, but the same standard: build something real, measure whether it works, and tell the truth about what the measurement shows.

For now, one person, a clear thesis, and a strong bias toward evidence. If that's the kind of partner you've been looking for, the door's open.

§ 05  ·  Contact

Building something that reasons about the world — or trying to tell whether someone else's system really does?

Tell us what you're working on. We'll tell you, plainly, whether and how we can help.