Product08.07.25George Sivulka

The Next Edge in Finance: Reasoning with GPT‑5 & Hebbia

As OpenAI’s research partner for financial services agents, we’ve spent weeks challenging GPT-5 on the most complex investing and banking workflows.

The Next Edge in Finance: Reasoning with GPT‑5 & Hebbia cover image

As OpenAI’s leading research partner for financial analysis, we’ve spent the last several weeks testing GPT-5 against the most complex financial agent workflows. 

We’re excited to share our findings, and what the next generation of foundation models means for Wall Street and beyond.

Measuring AI-driven alpha

The best investors spot patterns others miss. The best AI systems should be held to the same bar. Hebbia running on GPT-5 is the first system we’ve seen that truly unlocks alpha.

To objectively measure a model’s originality, or its tendency towards novel idea generation, the Hebbia Applied Research team came up with the idea of “insightfulness” to evaluate LLMs on financial tasks. You can think of insightfulness as the ability to go beyond summarization to surface risks, opportunities, or strategic context that’s not explicitly stated. 

Compared to every model we tested, GPT‑5 redefines state of the art.

Insightfulness measures whether the response goes beyond repeating existing information and surface implications, risks, opportunities, or strategic dynamics relevant to the prompt.

Financial Insightfulness measures whether the response goes beyond repeating existing information and surface implications, risks, opportunities, or strategic dynamics relevant to the prompt.

The first truly robust, truly agentic model

GPT-5’s capabilities extend far beyond novel pattern recognition. When put into Hebbia’s agentic finance environment, experiments revealed the most robust agent foundation we’ve ever seen:

  • Juggling dozens of tools: Hebbia offers a rich financial agent sandbox, with connectors to S&P Capital IQ, FactSet, Pitchbook, and dozens of datarooms and Sharepoint file trees. GPT-5 was the first model to truly tackle this complexity out of the box, elegantly selecting from a plethora of tools the exact APIs, integrations, and data sources to complete financial tasks end-to-end. 
  • Agentic “fault tolerance”: Rather than just following instructions, GPT‑5 is the first system we’ve seen that corrects its own work, and even corrects your work. It understands what you’re trying to do while exploring its environment, fills in missing info, and fixes errors, so your work keeps moving even when inputs aren’t perfect. It feels like it undoes hallucinations, even after going down a rabbit hole. 
  • Analysis across multiple variables: GPT‑5 cuts through noise to find the key drivers of a business–such as unit volumes, pricing, macro conditions—and uses them to build realistic financial projections. It builds upside, base, and downside scenarios where every assumption is traceable to real factors you can review and adjust.

How we’ve used it 

We stress-tested GPT‑5 on four real-world workflows that matter most to finance teams.

Create three-statement financial models 

With Hebbia, GPT-5 pulled together the richest financial model we’ve seen, populating assumptions with accurate data pulls from key data sources like SEC filings, Virtual Data Rooms, S&P Capital IQ, Pitchbook, FactSet, and PDFs. You can spend more time reviewing assumptions and refining the logic rather than spending hours on data entry. 

Forecast financials with assumptions

Unlike other models that focus only on high-level trends or single data points, GPT-5 can take into account multiple layers at once: company growth, industry-specific factors, market share shifts, pricing dynamics, macroeconomic conditions, and more. It uses all of these inputs to build full upside, downside, and base cases where the reasoning is transparent and you can adjust, challenge, and refine all of the assumptions and projections.  

Correct errors by understanding intent

We purposely threw GPT-5 a curveball: we asked it to build a model for a Norwegian company called “Autoshop”, which didn’t exist. After crawling CapIQ tables for public Norwegian companies, some of which were closer to the prompt, GPT-5 inferred that the intended company was “AutoStore,” and corrected the user. 

Make polished presentations faster

Storytelling still requires human judgment. With GPT‑5 in Hebbia, you can turn any analysis into reports and slide decks that follow your template. You can spend time refining the story with GPT-5 and polishing the slide outputs rather than starting from scratch.

How you win with Hebbia

Finance is entering an era where reasoning is automated but judgment remains human.

GPT‑5 provides the reasoning, while Hebbia brings the data, integrations, agent environment, and workflows to use it.

The winners will be those who spend less time gathering information and more time thinking strategically—we’re excited to play a part in realizing this future.