Understanding Context Windows and Token Limits

This article explains how tokens and context windows work across Harvey’s three key features: Assistant, Vault, and Workflows.

Last updated: Oct 29, 2025

Overview

Harvey leverages powerful AI models to generate high-quality outputs quickly and accurately. Like all large language models, Harvey operates within a token limit, which defines the size of its context window—how much information the AI can “see” at once when generating a response.

Instead of hard limits, Harvey uses recommendations for maximum document size to help maintain optimal response quality. These are guidelines rather than strict limits—you may exceed them and still receive satisfactory results.

Tokens and Context Windows

Note: to quickly view all recommended limits, use the table of contents to jump to the quick reference table.

Harvey processes text in tokens, which are small pieces of language—typically a word or part of a word. For example:

“contract” = 1 token
“unbelievable” = 3 tokens
"Please summarize this contract.” = ~6–8 tokens

Harvey’s models also have a context window, which is the total number of tokens the AI can consider at once. This includes:

Your input (prompt or question)
Any documents or Vault content included
Prior conversation history in the same thread
Harvey’s response

For example, if the context window is ~150 pages, Harvey can process that much combined input + output. If the content exceeds the limit, older or less relevant information will be dropped.

In short, tokens are the building blocks of text. The context window is the AI’s short-term memory, measured in tokens.

Assistant

Assistant’s context window is about ~240 pages of text. This includes:

The prompt or question you enter
Any documents you attach
Conversation history in the thread (your previous questions + responses)

When the limit is reached, older parts of the conversation or documents are dropped automatically.

Tip: Break up long documents or ask focused questions about smaller sections instead of uploading everything at once.

Vault

Vault allows you to store and reference large volumes of data without directly consuming your entire context window.

Document Storage

Vault documents are indexed and stored independently. When you ask a question, Harvey retrieves only the most relevant excerpts and passes those into Assistant or Workflows.

You can reference lengthy contracts or filings without hitting token limits—as long as your query is focused. When working with large files, phrase your questions clearly so Harvey retrieves only what’s relevant.

Optimal relevance: Ask, "What is the termination clause in Section 9?"
Less relevance: Ask, "Summarize the whole contract."

Review Tables

When querying Review Tables, Harvey uses optimizations to stay within ~25 pages:

Column Optimization: For large tables, focuses on the most relevant columns
Row Filtering: Narrows down rows and cells that match your criteria
Smart Reasoning: Balances completeness with efficiency, allowing the model to reason effectively across complex tables

We're actively developing these algorithms to improve precision and performance, especially for large-scale reviews.

Tip: Be as specific as possible (e.g., filter by dates, amounts, or terms) to get the most accurate results.

Workflows

Each block in a Workflow has the same suggested limits as an Assistant query ~240 pages.

When referencing documents or Vault, only relevant snippets are passed into that block.
AI-generated outputs and variables can be passed between blocks, but each block runs independently within its own token limit.

Tip: When chaining multiple AI blocks, keep inputs focused and avoid repeatedly passing large documents unless necessary

Quick Reference Table of Limits

Product Area	Recommended Document Limit	Notes
Assistant	~240 pages	Includes prompt, docs, and history
Vault (Docs)	Unlimited storage	Only relevant excerpts count against the context window
Vault (Review Tables)	~25 pages	Columns/rows optimized automatically
Workflows (per block)	~240 pages	Each block runs independently

What Happens If You Exceed the Context Window?

You can exceed the recommended document limits, but response quality may vary. The suggested limits are designed to maintain Harvey’s high standards for accuracy and completeness, ensuring the best possible results.

If limits are exceeded, Harvey automatically optimizes processing by:

Dropping older or less relevant parts of the conversation
Shortening or excluding portions of documents
Producing shorter or less detailed responses

If you are not satisfied with the response due to incomplete or inconsistent outputs, we recommend the following:

Simplify: Shorten your prompt or break it into smaller asks (e.g. ask a specific query first, then ask follow-up questions from the output).
Reduce: Remove or narrow the documents you upload.
Split: Run long tasks across multiple requests or workflow blocks.
Use Vault: Store large files in Vault so Harvey retrieves only what’s relevant.

FAQs

1.Overview
2.Tokens and Context Windows
3.Assistant
4.Vault
- 4.1.Document Storage
- 4.2.Review Tables
5.Workflows
6.Quick Reference Table of Limits
7.What Happens If You Exceed the Context Window?
8.FAQs

Understanding Context Windows and Token Limits

Overview

Tokens and Context Windows

Assistant

Vault

Document Storage

Review Tables

Workflows

Quick Reference Table of Limits

What Happens If You Exceed the Context Window?

FAQs

Table of Contents

Related Articles

Understanding Context Windows and Token Limits

Overview

Tokens and Context Windows

Assistant

Vault

Document Storage

Review Tables

Workflows

Quick Reference Table of Limits

What Happens If You Exceed the Context Window?

FAQs

Why doesn’t Harvey use the full context window available for each model?

Table of Contents

Related Articles