Forma Engineering Blog
Why Forma?
Traditional databases weren't built for the AI era. When your AI Agent outputs 12 fields today and 30 fields tomorrow, waiting 3-7 days for DDL approval isn't an option.
Forma solves this with a modern take on the EAV pattern:
| Problem | Traditional DB | Forma |
|---|---|---|
| New field | ALTER TABLE (days) | JSON Schema update (seconds) |
| Schema change | Downtime required | Zero downtime |
| AI output | Manual adaptation | Direct JSON Schema mapping |
| N+1 queries | 101 round-trips | 1 round-trip |
| Historical data | Same table, same cost | Cold storage on S3 |
English Series
A three-part engineering blog series explaining how Forma solves the challenges of building flexible, high-performance data storage for AI applications.
Series Introduction: From EAV to Zero-Dirty-Read Lakehouse
What is Forma and What Problems Does It Solve?
Three posts that explain a flexible data storage engine designed for the AI era. Start here for an overview of Forma's architecture and the three core problems it solves.
Part 1: Why EAV is the Most Underrated Data Model for AI
JSON Schema + Hot Table = AI-Ready Infrastructure
JSON Schema isn't just a validation tool—it's the core of AI-Ready infrastructure. Learn how to achieve: AI output → instant validation → zero-DDL storage.
Part 2: Killing N+1
How One SQL Trick Cut Our Latency by 40x
We reduced database round-trips from 101 to 1, and latency from 1000ms to 25ms—a 97% improvement. The secret is PostgreSQL's CTE + JSON_AGG.
Part 3: Zero Dirty Reads Lakehouse
Building a Trustworthy Lakehouse with DuckDB
PostgreSQL handles "the present," DuckDB + Parquet handles "the past." Learn how Anti-Join + Dirty Set mechanisms ensure zero dirty reads.
中文系列
三篇工程博客,讲透一个为 AI 时代设计的灵活数据存储引擎。
系列介绍:从 EAV 到零脏读的 Lakehouse
Forma 是什么?它解决什么问题?
三篇文章,讲透一个为 AI 时代设计的灵活数据存储引擎。从这里开始了解 Forma 的架构和它要解决的三个核心问题。
第一篇:为什么 EAV 是 AI 时代最被低估的数据模型
JSON Schema + 热表 = AI-Ready 基础设施
JSON Schema 不只是一个校验工具——它是 AI-Ready 基础设施的核心。实现:AI 输出 → 即时校验 → 零 DDL 入库。
第二篇:杀死 N+1
一次 SQL 优化如何让延迟从 1 秒降到 25 毫秒
我们将数据库查询次数从 101 次减少到 1 次,延迟从 1000ms 降至 25ms。秘诀是 PostgreSQL 的 CTE + JSON_AGG。
第三篇:零脏读的 Serverless 湖仓
我们如何用 DuckDB 解决一致性难题
PostgreSQL 负责"当下",DuckDB + Parquet 负责"历史"。Anti-Join + Dirty Set 机制确保联邦查询零脏读。