skip to content
Kostas Pardalis

Posts

  • Why I Started Typedef: Data Infrastructure for AI and Agentic Systems

    I've always loved building things from zero. The kind where you're not just writing code but you are figuring out everything: why it matters, who it's for, how to bring a team together and how to turn it into a product and a company. Two years ago I started working on something new based on a strong conviction that data, although already critical, was about to become exponentially more important.

    [... 551 words]

    AI infrastructure startup Typedef
  • How a Snowflake announcement explains dbt Labs' licensing change

    Snowflake announced it would offer dbt Core as a native Snowflake feature. This prompted dbt Labs to modify dbt Fusion's licensing to gain more leverage over its IP. The partnership between these companies has been symbiotic. Snowflake benefits from tools like dbt because they drive workloads inside Snowflake, while dbt Labs gained customer access and revenue through Snowflake's ecosystem.

    [... 210 words]

    dbt Snowflake data business
  • Batch Inference, Type Systems, and Why Cortex AISQL Got Me Excited

    Snowflake's Cortex AISQL announcement got me excited, and I want to explain why. It represents a paradigm shift in integrating large language models into data systems as structured, composable functions rather than opaque tools. Distilling LLM capabilities into five well-defined operators that address 80% of use cases signals meaningful progress. This approach prioritizes reproducibility and composability over raw prompt flexibility.

    [... 188 words]

    AI Snowflake SQL inference
  • Designing the Ideal Synthetic Data Generation Pipeline for LLMs

    Robust, maintainable, expressive and composable pipelines are critical for scaling synthetic data generation. This post advocates for abstractions that reduce boilerplate, avoiding ad-hoc scripts, and leveraging dataframe APIs with structured document representations. The concrete example involves fine-tuning a smaller model using synthetic QA pairs generated from SEC corporate reports by a frontier model, maintaining quality while reducing inference costs.

    [... 260 words]

    AI synthetic-data LLM data-engineering
  • DX ∪ UX = U ∧ DX ∩ UX = ∅

    User Experience is primarily concerned about guiding the user to a desired outcome in the most optimal way, optimizing for time and margin of error. That's why the term journey is heavily used within the context of UX design. Developer Experience on the other hand, is not about guiding the user, but designing the right abstractions and choosing what part of the system complexity to expose to the developer.

    [... 572 words]

    product DX UX
  • Why you should keep an eye on Apache DataFusion and its community.

    On June 24, 2024, the first San Francisco Bay Area DataFusion meetup happened. I had the opportunity to help with the organization of the event and also attend. The event had a lot of content from six different companies. These companies ranged from startups to scale-ups and big Fortune 500 companies. Leaving the event, I felt I had experienced something significant, and I want to share it with you.

    [... 983 words]

    data databases datafusion
  • A glimpse into the future of data processing infrastructure.

    Three weeks ago, VeloxCon took place in San Jose. The event was a great opportunity for people who are interested in execution engines and data processing at scale to learn about the current state of the project. Most importantly, though, it was an amazing opportunity to get a glimpse of what the future of data processing will be like. From what we saw at the event, this future is very exciting!

    [... 2,168 words]

    data databases velox
  • MLOps is Mostly Data Engineering.

    After a few years and with the hype gone, it has become apparent that MLOps overlap more with Data Engineering than most people believed. Let’s see why and what that means for the MLOps ecosystem. MLOps is a relatively recent term. A quick search on Google Trends reveals that the term started being searched for, around the end of 2019.

    [... 2,494 words]

    MLOps Opinion data engineering