What AI Models Does Harvey Use?

In this article, we’ll cover the main models Harvey uses and how they compare.

Last updated: Oct 8, 2025


Overview

To help you maximize your impact, Harvey leverages different advanced AI Large Language Models (LLMs), each designed with unique strengths. When you ask Harvey for assistance, our multi-model system will break down the request into sub-tasks, select a model to use, then synthesize the outputs for you. The current models we use are a mix of the following:

  • OpenAI GPT-5
  • OpenAI o3 model suite
  • OpenAI GPT-4.1 (4.1, 4.1-mini, 4.1-nano) and 4o model suite
  • Google Gemini 2.5 Pro model suite
  • Anthropic Sonnet 3.7 and Sonnet/Opus 4 model suite

By default, Harvey’s 'Auto' mode will select for you, but if you prefer to choose yourself, workspace admins can enable Model Selector for individual users or across your workspace. Just keep in mind that there is no single model that’s best for every task so we recommend Auto.

Note on the latest models:

  • As of September 30th, 2025 Claude Sonnet 4.5 is available to US and EU customers via selection in our Model Selector. We’re currently testing the model to evaluate further integration.
  • As of October 6, 2025, GPT-5 is included in Auto mode for Assistant chat and draft style tasks. We will be expanding its integration in Auto mode as we continue testing the model for reliability at scale. If you prefer to use GPT-5 across product areas in your workspace now, it is available in the Model Selector for US and EU users.

Our Model Evaluation Process

Harvey’s model evaluation methodology is comprehensive so that we understand not only raw model performance, but also the safety and reliability of each model before including it in our system. The pillars of our model evaluation are BigLaw Bench, product performance, and unstructured evaluation.

After we evaluate a model, we revisit the evaluation throughout its lifecycle to ensure we’re offering our users the best functionality.

Model Comparison

To help you navigate the models we offer, we’ve put together a high-level comparison table. Hover over the table and scroll right to view all columns.

Note: If model selector is enabled in your workspace but you’re not seeing a particular model, it may not be available for your region. Ask your Customer Success Manager to confirm what’s available to you.

Model

Developer

Model Release Date

Strengths

Weaknesses

BigLaw Bench Score

Knowledge Cut-off

GPT-5 (reasoning)

Open AI

August 7, 2025

  • Analysis detail and quality
  • Hard problem solving, particularly long-form writing and agentic behavior
  • Can overthink, providing overly complicated answers to straightforward problems
  • Formatting, particularly structured use of headers and lists

89.22%

September 2024

o3

Open AI

April 16, 2025

  • Foundational knowledge and reasoning 
  • Planning and execution of agentic problems and hard tasks
  • Sometimes hallucinates, especially if pressed for details it is not well positioned to provide
  • Can overthink straightforward problems

84.13%

June 2024

GPT-4.1

Open AI

April 14, 2025

  • Drafts comprehensive and organized outputs
  • Pulling out key information and workflow-specific tasks
  • Inconsistent quotation frequency and quality
  • Occasionally over-prioritizes conciseness

78.39%

June 2024

Gemini 2.5 Pro

Google

March 25, 2025

  • Drafts longer, detailed outputs
  • Multi-step analysis and outputs
  • Can overthink straightforward problems
  • Occasional misplaced or stiff tone

85.02%

February 2025

Claude Sonnet 4

Anthropic

May 22, 2025

  • Strong output structure and organization
  • Pulling out key information and workflow-specific tasks
  • Occasionally too wordy
  • Can lack depth and thoroughness on lengthier questions

81.37%

January 2025

Claude Sonnet 3.7

Anthropic

February 24, 2025

  • Strong and comprehensive legal analyses - Impressive inclusion of quoted material
  • Occasionally too wordy
  • Periodic formatting errors

78.91%

October 2024

Opus 4

Anthropic

May 22, 2025

  • Strong formatting and clarity
  • Grounding responses in underlying documents
  • Occasionally misses key factual or legal elements or over-simplifies
  • Sometimes too rigid in formatting
  • Tasks requiring specific formats

82.70%

January2025

Claude Sonnet 4.5

Anthropic

September 29, 2025

  • Pending further evaluation
  • Pending further evaluation

89.55%

January 2025

Looking forward, the landscape of AI technology is continuously evolving, and so is Harvey. We are constantly evaluating the latest models and their performance to ensure we are always providing the most advanced and effective solutions. Stay up-to-date on model availability and feature enhancements by following our Release Notes.

FAQs

Terminology

Security

Performance

Technical

What AI Models Does Harvey Use?