What AI Models Does Harvey Use?

In this article, we’ll cover the main models Harvey uses and how they compare.

Last updated: Feb 10, 2026

Overview

To help you maximize your impact, Harvey leverages different advanced AI Large Language Models (LLMs), each designed with unique strengths. When you ask Harvey for assistance, our multi-model system will break down the request into sub-tasks, select a model to use, then synthesize the outputs for you. The current models we use are a mix of the following:

Anthropic Sonnet/Opus 4 model suite
OpenAI GPT-5
OpenAI o3 model suite
OpenAI GPT-4.1 (4.1, 4.1-mini, 4.1-nano) and 4o model suite
Google Gemini 2.5 Pro model suite

By default, Harvey’s 'Auto' mode will select for you, but if you prefer to choose yourself, workspace admins can enable Model Selector for individual users or across your workspace. Just keep in mind that there is no single model that’s best for every task so we recommend Auto.

Notes on the latest models:

Gemini 3 Pro: We are working in parallel with the teams at Google and DeepMind to bring their Early Access model towards General Availability, at which point the model will be made available within Harvey.
Claude Opus 4.6, Sonnet 4.5, GPT-5.1, and GPT-5.2 are available via selection in our Model Selector. Check our Model Comparison Chart for regional availability. We’re currently testing these models to evaluate further integration.
GPT-5 is included in Auto mode for Assistant chat and draft style tasks for US and EU customers. We will be expanding its integration in Auto mode as we continue testing the model for reliability at scale. If you prefer to use GPT-5 across product areas in your workspace now, it is available in the Model Selector for the US and EU.
Claude Opus 4.5 is included in Auto mode for US and EU customers.

Our Model Evaluation Process

Harvey’s model evaluation methodology is comprehensive so that we understand not only raw model performance, but also the safety and reliability of each model before including it in our system. The pillars of our model evaluation are BigLaw Bench, product performance, and unstructured evaluation.

After we evaluate a model, we revisit the evaluation throughout its lifecycle to ensure we’re offering our users the best functionality.

Model Comparison

To help you navigate the models we offer, we’ve put together a high-level comparison table. Use the horizontal scrollbar at the bottom of the table to view all columns, including availability by region.

Note: If model selector is enabled in your workspace but you’re not seeing a particular model, it may not be available for your region. Ask your Customer Success Manager to confirm what’s available to you.

Model	Developer	Model Release Date	Strengths	Weaknesses	BigLaw Bench Score	Knowledge Cut-off	Regional Availability
GPT-5.2	Open AI	December 11, 2025	Proactive guardrails Stronger capability awareness Clearer transparency features	Pending further evaluation	89.8%	August 2025	US
GPT-5.1	Open AI	November 12, 2025	Legal reasoning More detailed and better structured outputs Instruction following	Pending further evaluation	91.8%	September 2024	US EU
GPT-5 (reasoning)	Open AI	August 7, 2025	Analysis detail and quality Hard problem solving, particularly long-form writing and agentic behavior	Can overthink, providing overly complicated answers to straightforward problems Formatting, particularly structured use of headers and lists	89.22%	September 2024	US EU AU coming late 2025
o3	Open AI	April 16, 2025	Foundational knowledge and reasoning Planning and execution of agentic problems and hard tasks	Sometimes hallucinates, especially if pressed for details it is not well positioned to provide Can overthink straightforward problems	84.13%	June 2024	US EU AU
GPT-4.1	Open AI	April 14, 2025	Drafts comprehensive and organized outputs Pulling out key information and workflow-specific tasks	Inconsistent quotation frequency and quality Occasionally over-prioritizes conciseness	78.4%	June 2024	US EU AU
Gemini 2.5 Pro	Google	March 25, 2025	Drafts longer, detailed outputs Multi-step analysis and outputs	Can overthink straightforward problems Occasional misplaced or stiff tone	85.0%	February 2025	US EU
Claude Sonnet 4	Anthropic	May 22, 2025	Strong output structure and organization Pulling out key information and workflow-specific tasks	Occasionally too wordy Can lack depth and thoroughness on lengthier questions	81.4%	January 2025	US EU
Opus 4	Anthropic	May 22, 2025	Strong formatting and clarity Grounding responses in underlying documents	Occasionally misses key factual or legal elements or over-simplifies Sometimes too rigid in formatting Tasks requiring specific formats	82.7%	January 2025	US EU
Claude Sonnet 4.5	Anthropic	September 29, 2025	Pending further evaluation	Pending further evaluation	89.6%	January 2025	US EU AU
Claude Opus 4.5	Anthropic	Nov 24, 2025	Excels at agentic tasks that require planning and iteration More coming soon	Pending further evaluation	88.3%	May 2025	US EU
Claude Opus 4.6	Anthropic	February 5, 2026	Excels at agentic tasks that require planning and iteration More coming soon	Pending further evaluation	90.2%	May 2025	US EU AU

Looking forward, the landscape of AI technology is continuously evolving, and so is Harvey. We are constantly evaluating the latest models and their performance to ensure we are always providing the most advanced and effective solutions. Stay up-to-date on model availability and feature enhancements by following our Release Notes.

FAQs

Terminology

Security

Performance

Technical

Our system is designed to break up a larger task into smaller tasks that AI models are more likely to execute correctly. We created a “Partner”-like model to create a plan and delegate each subtask to the right model. The type of work being completed determines the model that is used.

The Details

We build our models to be similar to how a partners coordinate complex, multi-step tasks—delegating the work in a specific order to specific associates.

How It Works

Our Partner model creates a plan to solve the overall tasks and identify which subsystems to delegate portions of each task to.
Based on these assignments, our subsystems will receive the necessary context from the Partner model.
The subsystem the performs the task to the best of its abilities.
The output is sent back to the Partner model.
The Partner model synthesizes the output and produces the final result.

Example (Case Law)

Let’s say you ran this query: “Can the pass through defense be used to defeat class cert in nd cal.” The Partner model might generate the following list of tasks:

A list of sub questions that need to be answered to answer the original question.
A list of case law searches to answer each sub question and examples of what the subsystems should look for.
A simple plan to answer the subquestions (“summarize these cases and highlight these things for each subquestion”).
Synthesize the results into a final answer.

Example (Drafting)

Let’s say you ran this query: “Draft a client alert on the attached EU AI Act of 2024” with an uploaded copy of the EU AI Act of 2024. The Partner model might generate the following list of tasks:

A list of sub questions that need to be answered to get all the relevant information from the provided document.
A simple plan to answer the subquestions (“summarize these details and extract key information from these things for each subquestion”).
An outline of the client alert, to be filled in with information collected above.
Synthesize the results into a final answer, following the outline.

Getting Started with Harvey

A high level overview of Harvey features and how to navigate our platform.

User Login Guide

Learn how to log in to your Harvey account securely and without issues.

How to Set Up Your User Profile

Discover how to customize Harvey to fit your needs.

Harvey Guide: Learn How to Use Harvey Directly in Harvey

Learn how to use Harvey directly in the product with instant, cited answers sourced from the Help Center.

Choosing Between Threads and Review Tables

Learn how review tables and threads work together to streamline your document review and analysis.

Overview
Our Model Evaluation Process
Model Comparison
FAQs
Related Articles

What AI Models Does Harvey Use?

Overview

Our Model Evaluation Process

BigLaw Bench

Product performance

Unstructured evaluation

Model Comparison

FAQs

Terminology

What does ‘Generative AI’ do?

What are "tokens" and how are they used?

What is a "context window" and how does it affect Harvey's performance?

Do Harvey's AI models have "knowledge cutoff dates"? What does that mean for me?

What defines whether a model is best for “complex” vs. “general-purpose” tasks?

Security

How does Harvey handle sensitive data when using AI models?

Performance

Why does AI sometimes respond unexpectedly or seem to "overthink" an easy problem?

How often are Harvey's AI models updated or changed?

If Harvey is always choosing the best model, why might someone want to choose another model?

Can I view which model(s) Harvey uses on a query?

How is Harvey different from ChatGPT?

Does Harvey require extensive “prompt engineering”?

Are there limits to usage?

What are hallucinations and how do you prevent them?

Technical

What are the models trained on? What makes Harvey's models "fine-tuned" or "tailored" for legal work?

How does Harvey work? Does Harvey use RAG or fine-tuning?

How does Harvey decide which model to use for different tasks?

Can I choose different GPT-5 model versions? Are ‘GPT-5 thinking’ and ‘GPT‑5 Pro’ available?

Related Articles

Getting Started with Harvey

User Login Guide

How to Set Up Your User Profile

Harvey Guide: Learn How to Use Harvey Directly in Harvey

Choosing Between Threads and Review Tables

Table of Contents