As OpenAI’s leading research partner for financial analysis, we’ve spent the last several weeks testing GPT-5 against the most complex financial agent workflows.
We’re excited to share our findings, and what the next generation of foundation models means for Wall Street and beyond.
The best investors spot patterns others miss. The best AI systems should be held to the same bar. Hebbia running on GPT-5 is the first system we’ve seen that truly unlocks alpha.
To objectively measure a model’s originality, or its tendency towards novel idea generation, the Hebbia Applied Research team came up with the idea of “insightfulness” to evaluate LLMs on financial tasks. You can think of insightfulness as the ability to go beyond summarization to surface risks, opportunities, or strategic context that’s not explicitly stated.
Compared to every model we tested, GPT‑5 redefines state of the art.
Financial Insightfulness measures whether the response goes beyond repeating existing information and surface implications, risks, opportunities, or strategic dynamics relevant to the prompt.
GPT-5’s capabilities extend far beyond novel pattern recognition. When put into Hebbia’s agentic finance environment, experiments revealed the most robust agent foundation we’ve ever seen:
We stress-tested GPT‑5 on four real-world workflows that matter most to finance teams.
With Hebbia, GPT-5 pulled together the richest financial model we’ve seen, populating assumptions with accurate data pulls from key data sources like SEC filings, Virtual Data Rooms, S&P Capital IQ, Pitchbook, FactSet, and PDFs. You can spend more time reviewing assumptions and refining the logic rather than spending hours on data entry.
Unlike other models that focus only on high-level trends or single data points, GPT-5 can take into account multiple layers at once: company growth, industry-specific factors, market share shifts, pricing dynamics, macroeconomic conditions, and more. It uses all of these inputs to build full upside, downside, and base cases where the reasoning is transparent and you can adjust, challenge, and refine all of the assumptions and projections.
We purposely threw GPT-5 a curveball: we asked it to build a model for a Norwegian company called “Autoshop”, which didn’t exist. After crawling CapIQ tables for public Norwegian companies, some of which were closer to the prompt, GPT-5 inferred that the intended company was “AutoStore,” and corrected the user.
Storytelling still requires human judgment. With GPT‑5 in Hebbia, you can turn any analysis into reports and slide decks that follow your template. You can spend time refining the story with GPT-5 and polishing the slide outputs rather than starting from scratch.
Finance is entering an era where reasoning is automated but judgment remains human.
GPT‑5 provides the reasoning, while Hebbia brings the data, integrations, agent environment, and workflows to use it.
The winners will be those who spend less time gathering information and more time thinking strategically—we’re excited to play a part in realizing this future.