Engineering

EngineeringJake Skinner04.02.2026

The Hidden Economics of LLM Inference

Fine-tuning an open-source model to match frontier quality is the easy part but serving it cost-effectively is the real challenge.

EngineeringJoseph Renner03.26.2026

Evaluating AI Agents: A Hybrid Deterministic and Rubric-Based Framework

How Hebbia measures agent quality at scale with a hybrid evaluation methodology.

EngineeringLukas Schmit03.19.2026

FFTxt: 30k Parameters Is All You Need

Hebbia researchers leveraged classic signal processing techniques to build a text detection model smaller than most of the images it classifies.

EngineeringJake Skinner, Davis Li, Adithya Ramanathan 09.19.2025

Reaching Autonomous Consensus on Agentic Outputs

We built a statistically rigorous, consensus-based framework for evaluating LLM outputs and used it to benchmark today’s leading models on the tasks that matter most to finance professionals.

EngineeringWilliam Luer07.30.2025

A Look Inside Hebbia's "Deeper" Research Agent

We built a multi-agent system that goes beyond public web search to synthesize insights for any data source, including proprietary data sources.

EngineeringLucas Haarmann and Bowen Zhang06.17.2025

The Multi-Agent Redesign Behind Matrix

At the end of last year, we returned to the drawing board and redesigned Matrix Agent.

EngineeringBen Devore03.17.2025

The Distributed System Behind Hebbia's High-Scale AI

We built a distributed LLM request scheduler that intelligently routes billions of tokens per day across multiple providers so high-priority work always gets through, even under rate limits.

EngineeringAdithya Ramanathan02.14.2025

Goodbye, RAG: How Hebbia solved Information Retrieval for LLMs

After pioneering semantic search and RAG, we found both fell short on the hardest questions so we scrapped them and built a new information retrieval system from scratch.