What AI Models Does Harvey Use?

In this article, we’ll cover the main models Harvey uses and how they compare.

Last updated: Nov 21, 2025


Overview

To help you maximize your impact, Harvey leverages different advanced AI Large Language Models (LLMs), each designed with unique strengths. When you ask Harvey for assistance, our multi-model system will break down the request into sub-tasks, select a model to use, then synthesize the outputs for you. The current models we use are a mix of the following:

  • OpenAI GPT-5
  • OpenAI o3 model suite
  • OpenAI GPT-4.1 (4.1, 4.1-mini, 4.1-nano) and 4o model suite
  • Google Gemini 2.5 Pro model suite
  • Anthropic Sonnet/Opus 4 model suite

By default, Harvey’s 'Auto' mode will select for you, but if you prefer to choose yourself, workspace admins can enable Model Selector for individual users or across your workspace. Just keep in mind that there is no single model that’s best for every task so we recommend Auto.

Notes on the latest models:

  • Gemini 3 Pro: We are working in parallel with the teams at Google and DeepMind to bring their Early Access model towards General Availability, at which point the model will be made available within Harvey.
  • Claude Sonnet 4.5 and GPT-5.1 are available via selection in our Model Selector. The latter is only available for the US and EU. We’re currently testing these models to evaluate further integration.
  • GPT-5 is included in Auto mode for Assistant chat and draft style tasks for US and EU customers. We will be expanding its integration in Auto mode as we continue testing the model for reliability at scale. If you prefer to use GPT-5 across product areas in your workspace now, it is available in the Model Selector for the US and EU.

Our Model Evaluation Process

Harvey’s model evaluation methodology is comprehensive so that we understand not only raw model performance, but also the safety and reliability of each model before including it in our system. The pillars of our model evaluation are BigLaw Bench, product performance, and unstructured evaluation.

After we evaluate a model, we revisit the evaluation throughout its lifecycle to ensure we’re offering our users the best functionality.

Model Comparison

To help you navigate the models we offer, we’ve put together a high-level comparison table. Use the horizontal scrollbar at the bottom of the table to view all columns, including availability by region.

Note: If model selector is enabled in your workspace but you’re not seeing a particular model, it may not be available for your region. Ask your Customer Success Manager to confirm what’s available to you.

Model

Developer

Model Release Date

Strengths

Weaknesses

BigLaw Bench Score

Knowledge Cut-off

Regional Availability

GPT-5.1

Open AI

November 12, 2025

  • Legal reasoning
  • More detailed and better structured outputs
  • Instruction following
  • Pending further evaluation

91.8%

September 2024

  • US
  • EU

GPT-5 (reasoning)

Open AI

August 7, 2025

  • Analysis detail and quality
  • Hard problem solving, particularly long-form writing and agentic behavior
  • Can overthink, providing overly complicated answers to straightforward problems
  • Formatting, particularly structured use of headers and lists

89.22%

September 2024

  • US
  • EU
  • AU coming late 2025

o3

Open AI

April 16, 2025

  • Foundational knowledge and reasoning 
  • Planning and execution of agentic problems and hard tasks
  • Sometimes hallucinates, especially if pressed for details it is not well positioned to provide
  • Can overthink straightforward problems

84.13%

June 2024

  • US
  • EU
  • AU

GPT-4.1

Open AI

April 14, 2025

  • Drafts comprehensive and organized outputs
  • Pulling out key information and workflow-specific tasks
  • Inconsistent quotation frequency and quality
  • Occasionally over-prioritizes conciseness

78.39%

June 2024

  • US
  • EU
  • AU

Gemini 2.5 Pro

Google

March 25, 2025

  • Drafts longer, detailed outputs
  • Multi-step analysis and outputs
  • Can overthink straightforward problems
  • Occasional misplaced or stiff tone

85.02%

February 2025

  • US
  • EU

Claude Sonnet 4

Anthropic

May 22, 2025

  • Strong output structure and organization
  • Pulling out key information and workflow-specific tasks
  • Occasionally too wordy
  • Can lack depth and thoroughness on lengthier questions

81.37%

January 2025

  • US
  • EU

Opus 4

Anthropic

May 22, 2025

  • Strong formatting and clarity
  • Grounding responses in underlying documents
  • Occasionally misses key factual or legal elements or over-simplifies
  • Sometimes too rigid in formatting
  • Tasks requiring specific formats

82.70%

January 2025

  • US
  • EU

Claude Sonnet 4.5

Anthropic

September 29, 2025

  • Pending further evaluation
  • Pending further evaluation

89.55%

January 2025

  • US
  • EU
  • AU

Looking forward, the landscape of AI technology is continuously evolving, and so is Harvey. We are constantly evaluating the latest models and their performance to ensure we are always providing the most advanced and effective solutions. Stay up-to-date on model availability and feature enhancements by following our Release Notes.

FAQs

Terminology

Security

Performance

Technical