Technical papers, engineering insights, and methodology deep-dives from the team building the future of AI document intelligence.
fluidzero — ResearchTechnical papers, engineering insights, and methodology deep-dives from the fluidzero research team.Posts### Citation-Grounded Retrieval for Enterprise Search- Category: Citation Grounded Retrieval- Date: 2026-02-27- Author: Hrishikesh Kakkad- URL: /research/citation-grounded-retrievalThe promise of AI-powered document search is simple: ask a question in natural language, get an accurate answer. The reality in enterprise settings is more demanding. An answer without a source is an opinion. In regulated industries — aerospace, pharmaceuticals, energy, finance — an unsourced claim from an AI system is worse than no answer at all.### Multimodal Document Understanding at Scale- Category: Multimodal Document Understanding- Date: 2026-02-27- Author: Hrishikesh Kakkad- URL: /research/multimodal-document-understandingModern enterprises operate on documents that communicate through more than just text. Engineering specifications embed critical dimensions in CAD drawings. Financial reports convey trends through charts that resist tabular extraction. Safety manuals pair procedural text with annotated diagrams where the relationship between the two carries the meaning.### Multi-Vector Embeddings Are Great. Until They're Not.- Category: Muvera- Date: 2026-02-27- Author: Tilak Sharma- URL: /research/muvera*A deep dive into multi-vector embeddings, why they're brilliant, and why storing them naively will bankrupt your infrastructure.*### Processing Heterogeneous Documents at 100GB+ Scale- Category: Processing Heterogeneous Documents- Date: 2026-02-27- Author: Hrishikesh Kakkad- URL: /research/processing-heterogeneous-documentsEnterprise document repositories are not curated datasets. They are decades of accumulated PDFs, scanned images, Word documents, spreadsheets, CAD exports, and legacy formats — often with inconsistent naming, duplicate versions, and no centralized metadata. When a customer tells us they have "about 100,000 documents," the reality is 100,000 files in 15+ formats, spanning 20 years, totaling anywhere from 50GB to 500GB.
The promise of AI-powered document search is simple: ask a question in natural language, get an accurate answer. The reality in enterprise settings is more demanding. An answer without a source is an opinion. In regulated industries — aerospace, pharmaceuticals, energy, finance — an unsourced claim from an AI system is worse than no answer at all.
Modern enterprises operate on documents that communicate through more than just text. Engineering specifications embed critical dimensions in CAD drawings. Financial reports convey trends through charts that resist tabular extraction. Safety manuals pair procedural text with annotated diagrams where the relationship between the two carries the meaning.
*A deep dive into multi-vector embeddings, why they're brilliant, and why storing them naively will bankrupt your infrastructure.*
Enterprise document repositories are not curated datasets. They are decades of accumulated PDFs, scanned images, Word documents, spreadsheets, CAD exports, and legacy formats — often with inconsistent naming, duplicate versions, and no centralized metadata. When a customer tells us they have "about 100,000 documents," the reality is 100,000 files in 15+ formats, spanning 20 years, totaling anywhere from 50GB to 500GB.