Why I Started Typedef: Data Infrastructure for AI and Agentic Systems

18 Jun, 2025

I've always loved building things from zero. The kind where you're not just writing code but you are figuring out everything: why it matters, who it's for, how to bring a team together and how to turn it into a product and a company.

Two years ago I started working on something new based on a strong conviction that data, although already critical, was about to become exponentially more important.

And I couldn't have started at a better time.

Since then, LLMs became mainstream and data stopped being about plumbing for dashboards and reports. Data became the raw material which, when coupled with models, can become the product itself.

It feels like a reset. Like the early days of the Internet. Flaky, fascinating and full of opportunity. Except now, instead of figuring out how to move bits across the wire, we are figuring out how to feed meaning into context windows.

We're at a point where we're learning how to turn these new technologies into real systems, production-grade tools, new primitives, new problems to solve, and new roles to step into.

But I realized something early on: this new world can't be built with the tools that shaped the last one.

If we want to make the promise of AI real, we have to think from first principles and build from scratch.

We built Typedef from scratch but we didn't build it alone. We literally stand on the shoulders of an incredible open source ecosystem: tools like Apache Arrow, Apache Datafusion, Polars, DuckDB, LanceDB and many others have made this possible.

They gave us the building blocks to focus on what's new, while standing on what's solid.

Equipped with that foundation, we've spent the past year heads down building Typedef.

Typedef is an AI-native data engine that unifies inference, search, and data processing into a single system.

Our goal is to accelerate the maturity of these new technologies. To bring AI workloads the same level of stability, performance and composability that made modern software infrastructure possible.

And that's more important than ever. Because in this moment, we're not just building solutions, we are still discovering the problems. Problems we couldn't even see before, because we didn't have the tools to explore them.

Which means we need to move fast, learn fast, and when the solution is right, get it to production-grade fast too.

And just as important: we want to make these tools accessible — not just to more developers, but to models too. That’s why we’ve been careful to build using paradigms people already know how to use — so models can use them too.

If you know PySpark, Pandas, or Python, you already know how to use Typedef.

And guess what?

Models are pretty good at speaking those languages too.

We’re live today.

If you’re building with LLMs or building for them I’d love for you to check it out.

Typedef makes it easy to run inference pipelines, power semantic search, and process unstructured data, all in one engine.

It’s built for real-world use cases like semantic processing at scale, customer support automation, agent-based workflows, data labeling, OLAP+LLM workloads, content moderation, and more.

→ typedef.ai

→ GitHub

We’re just getting started. Feedback, contributions, and ideas are all welcome.

Let’s build the new stack, together.

#AI infrastructure #AI workloads #DuckDB #LLM #Pandas #Polars #PySpark #Typedef #agentic systems #data processing #data stack #inference engine #open source #semantic search #startup launch