Maat Legal Research Agent: How Domain-Specific RAG Pipelines Handle Citation Chains and Precedent Graphs

Legal research agents face a problem that generic RAG pipelines ignore: a citation must not only exist, it must actually support the argument. In competition law, where merger approvals and antitrust rulings hinge on precedent chains, hallucinated case references or keyword-matched citations that miss the legal holding create liability.

Maat is a ReAct agent built for competition law research. It orchestrates retrieval, web search, and clarification prompts to ground findings in official sources, surface rich inline citations, and handle the ambiguity inherent in legal queries. The architecture exposes patterns for any domain where correctness and provenance matter more than speed: compliance, medical literature, financial regulation.

Why General RAG Fails in Legal Domains

Competition law experts must review volumes of cases, decisions, and judicial reports to identify precedents and assess merger elements. General assistants (Claude, ChatGPT) and legal-specific models (SaulLM-7B, LegalGPT) fall short because they lack specialized domain expertise, provide insufficient official citations, or hallucinate competition law cases.

Legal research requires multi-hop reasoning across citation graphs. A ruling in Case A cites Case B, which distinguishes Case C. The agent must traverse this graph, validate each link, and surface the chain of authority.

Maat Architecture: ReAct Orchestration with Fallback Layers

Maat uses a ReAct loop to orchestrate tools corresponding to different research tasks. The agent iterates through reasoning steps, tool calls, and observations until it produces a grounded answer.

Core Components

Component	Purpose	Failure Mode
RAG retrieval	Grounds cases in official database (competition law corpus)	Database coverage gaps for recent rulings or niche jurisdictions
Web search fallback	Retrieves cases not in the database	Returns non-authoritative sources or paywalled content
Clarification prompts	Asks user to disambiguate ambiguous queries	User provides insufficient context, agent loops
Citation validator	Checks that retrieved case actually supports the legal argument	Validator relies on LLM interpretation, not formal logic
Inline citation renderer	Surfaces official citations with document links	Citation format varies by jurisdiction, requires normalization

Orchestration Flow

The agent iterates through a ReAct loop: (1) the LLM decides the next action based on current state (RAG search, web fallback, or clarification prompt), (2) executes the corresponding tool call, (3) observes the result, (4) updates internal state with findings and citations. The validation step checks that retrieved holdings support the query before rendering citations. The loop terminates when the agent decides it has sufficient grounded evidence or reaches a maximum iteration limit.

In practice, this requires integration with competition law databases (vector stores or BM25 indexes with reranking), web search APIs with domain authority filtering, and user prompt interfaces with timeout handling. Each tool call must handle network failures, rate limits, and malformed responses. Structured logging captures retrieval scores, LLM reasoning traces, and validation decisions for audit trails.

Maat validates that the holding (the legal principle established by the case) supports the argument. This requires extracting structured elements from case documents: headnotes, majority opinions, dissents. The validation step relies on LLM semantic judgment, not formal logical proof. Confidence scores reflect retrieval rank and LLM certainty, not legal validity. This mitigates but does not eliminate hallucination risk. The agent still requires expert review for high-stakes decisions. The LLM validator can still hallucinate support where none exists, so the architecture layers multiple checks: retrieval from authoritative databases, citation format validation against official sources, and confidence scoring that flags low-certainty claims for human review.

Retrieval Chunking for Legal Document Structure

Legal documents have internal structure that generic chunking strategies destroy. A case includes:

Headnote: Summary of the legal issue and holding.
Majority opinion: The court’s reasoning.
Concurring opinions: Judges who agree with the outcome but for different reasons.
Dissents: Judges who disagree, often cited in later cases that overturn precedent.

Chunking by token count or paragraph breaks loses this structure. Maat preserves it:

Section-aware chunking: Split documents at section boundaries (headnote, opinion, dissent).
Metadata tagging: Tag each chunk with its role (holding, reasoning, dissent).
Hierarchical retrieval: Retrieve headnotes first for relevance filtering, then retrieve full opinions for detailed analysis.

This approach generalizes to any domain with structured documents: medical literature (abstract, methods, results), financial filings (risk factors, MD&A, footnotes), compliance reports (findings, recommendations, remediation).

Handling Temporal Precedent and Circuit Splits

Competition law has temporal dynamics that RAG pipelines must handle:

Newer rulings override older ones: A 2025 Supreme Court decision overturns a 2018 circuit court ruling. The agent must rank by precedential weight, not just semantic similarity.
Circuit splits: Different appellate circuits reach conflicting conclusions on the same legal question. The agent must surface both sides and flag the split.

The paper abstracts temporal precedent handling as graph traversal. Production systems must distinguish between two types of precedent invalidation:

Overturned: A higher court explicitly reverses a lower court’s decision in the same case or directly contradicts the legal principle in a later ruling.
Superseded by: New legislation or a ruling in a different case establishes a conflicting legal standard that replaces the old precedent without directly overturning it.

Both types reduce a case’s precedential weight, but overturned cases are typically excluded entirely from citation chains, while superseded cases may still be cited for historical context or to show the evolution of legal standards. The agent would build a citation graph where edges represent “cites,” “overturns,” or “supersedes” relationships, then walk the graph to identify controlling authority and flag conflicts.

Observability: Auditing Citation Chains

Legal experts must audit which documents influenced each conclusion. Maat surfaces:

Inline citations: Every claim links to the official case citation and document.
Confidence scores: Retrieval score and LLM confidence that the holding supports the argument.
Citation chain visualization: Graph showing how Case A cites Case B, which distinguishes Case C.

Observability hooks:

Retrieval logs: Record which documents were retrieved, their scores, and why they were included or excluded.
LLM reasoning traces: Log the LLM’s reasoning for why a holding supports the argument.
User feedback loop (recommended extension): Let experts flag incorrect citations, feeding corrections back into the retrieval model.

This pattern applies to any high-stakes domain. Medical agents must show which studies support a treatment recommendation. Financial agents must trace which filings support a risk assessment.

Failure Modes and Mitigation

Scenario	Root Cause	Maat’s Response
Database coverage gaps	Recent rulings or niche jurisdictions not in corpus	Web search fallback, but filter for authoritative sources
Citation hallucination	LLM generates plausible but nonexistent case names	Validate all citations against official database before rendering
Keyword matching without semantic understanding	Retrieval returns cases that mention terms but do not support argument	LLM validation step checks that holding supports query
Ambiguous queries	User asks “What are the precedents for vertical mergers?” without specifying jurisdiction or time period	Clarification prompts force disambiguation before retrieval
Temporal ranking errors	Old case ranked higher than newer controlling authority	Precedent graph traversal and temporal boosting in ranking
Merger filing deadlines	Time pressure for rapid precedent lookup in multi-jurisdictional deals	Parallel retrieval across jurisdictions with priority queuing

Maat enables error auditing. Every citation links to the source. Every confidence score exposes uncertainty.

Performance: Case-Specific vs. Theoretical Tasks

The paper reports that Maat outperforms baseline assistants (Claude, ChatGPT, SaulLM-7B, LegalGPT) on case-specific tasks: identifying precedents, extracting holdings, validating citations. On theoretical questions (e.g., “What is the legal standard for market definition in merger analysis?”), Maat performs within range of the top baseline. The authors provide quantitative evaluation results in the full paper and make the dataset available on GitHub at the repository linked in the paper.

This split makes sense. Case-specific tasks require retrieval and citation validation, where RAG pipelines excel. Theoretical questions require synthesis and reasoning, where general LLMs have an edge. The agent’s value is grounding claims in official sources, not generating legal theory.

Technical Verdict

Maat’s architecture is optimal for high-stakes domains where source validation and temporal ranking are non-negotiable.

Use Maat’s architecture when:

Your domain has structured documents with internal hierarchy (cases, studies, filings).
Correctness and provenance matter more than speed.
Users must audit which sources influenced each conclusion.
Temporal dynamics (newer sources override older ones) affect ranking.
Generic RAG pipelines hallucinate or return keyword matches that do not support the argument.

Avoid when:

Your domain lacks authoritative sources or citation standards.
Speed matters more than correctness (e.g., conversational assistants).
Users do not need to audit source chains.
Documents lack internal structure that chunking strategies can preserve.

The patterns generalize beyond legal research. Compliance agents validating regulatory citations, medical agents grounding treatment recommendations in clinical trials, and financial agents tracing risk factors to SEC filings all face the same problem: retrieval is not enough. You must validate that the source supports the claim, preserve document structure, handle temporal precedent, and make the reasoning auditable.

Source Links

Primary source: Maat: The Agentic Legal Research Assistant for Competition Protection (arXiv:2605.27331v1)
PDF: https://arxiv.org/pdf/2605.27331v1.pdf