Testing MCP Servers Before Production: Contract Tests for Agent Tool Interfaces

Your MCP server works on your laptop. Tool calls return the right shapes, the client connects cleanly, the session behaves. Then you deploy it and a client reconnects after a network hiccup and the session state is gone. Or you scale to two instances and half the requests fail because session IDs resolve to the wrong process. Or someone sends two concurrent requests and the tool handler corrupts shared state.

Testing catches these before your users do. This is a testing playbook for TypeScript MCP servers built on the official SDK, focused on the failure modes that only appear when agents call your tools in production.

The demo-to-production gap

The official TypeScript SDK makes it easy to get something working. A few tool registrations, an McpServer instance, a transport, and you are serving. The problem is that “working” in the demo sense and “working” in the production sense are different things.

A demo tests one happy path. Production tests edge cases that emerge from real clients:

Reconnects after network interruption
Concurrent tool calls from the same session
Malformed inputs that pass TypeScript but fail JSON-RPC validation
Slow downstream APIs that timeout
Transport contract violations

None of those show up in a single manual run against your local instance. The gap is not a criticism of the SDK. It is a consequence of how easy the SDK makes it to build a server without thinking about what breaks it.

What actually breaks in production

Three categories fail most often.

Transport behavior. The SDK added Streamable HTTP support in version 1.10.0. Under this transport, the server exposes a single HTTP endpoint that handles both POST and GET. Clients use POST for tool calls and GET to open a streaming connection via server-sent events. Tests that only exercise stdio miss this entirely.

Session state. The StreamableHTTPServerTransport is stateful per session. If you store anything in process memory keyed by session ID, a restart or a second instance will drop it. Tests that do not simulate reconnects miss this. Tests that do not simulate multiple instances miss this.

Tool contracts. TypeScript type safety does not guarantee runtime JSON-RPC contract compliance. A tool handler can return a shape that satisfies the TypeScript compiler but violates the MCP tool schema. The client receives malformed JSON-RPC and the agent loop breaks.

Contract testing pattern for MCP tools

A contract test validates the request/response shape without running a full agent loop. Here is what that looks like for a single tool:

import { describe, it, expect } from 'vitest';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

describe('weather tool contract', () => {
  it('returns valid schema-compliant response', async () => {
    const server = new McpServer({
      name: 'test-server',
      version: '1.0.0',
    });

    // Register the tool
    server.tool(
      'get_weather',
      'Get current weather for a location',
      {
        location: z.string().describe('City name'),
      },
      async ({ location }) => {
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                location,
                temperature: 72,
                conditions: 'sunny',
              }),
            },
          ],
        };
      }
    );

    // Simulate a tool call
    const result = await server.callTool({
      name: 'get_weather',
      arguments: { location: 'San Francisco' },
    });

    // Validate response shape
    expect(result.content).toBeDefined();
    expect(result.content[0].type).toBe('text');
    
    const parsed = JSON.parse(result.content[0].text);
    expect(parsed).toHaveProperty('location');
    expect(parsed).toHaveProperty('temperature');
    expect(parsed).toHaveProperty('conditions');
  });

  it('rejects invalid input parameters', async () => {
    const server = new McpServer({
      name: 'test-server',
      version: '1.0.0',
    });

    server.tool(
      'get_weather',
      'Get current weather for a location',
      {
        location: z.string().min(1),
      },
      async ({ location }) => {
        return {
          content: [{ type: 'text', text: `Weather for ${location}` }],
        };
      }
    );

    // This should throw because location is empty
    await expect(
      server.callTool({
        name: 'get_weather',
        arguments: { location: '' },
      })
    ).rejects.toThrow();
  });
});

This pattern catches schema drift at build time. If you change the tool’s return shape but forget to update the client code, the test fails before you deploy.

Mocking the transport layer

Testing tool logic in isolation requires stubbing the transport. The SDK does not expose a clean mock transport, so you build one:

import { Transport } from '@modelcontextprotocol/sdk/shared/transport.js';

class MockTransport implements Transport {
  private messageQueue: any[] = [];
  
  async start() {
    // No-op for testing
  }

  async close() {
    // No-op for testing
  }

  async send(message: any) {
    this.messageQueue.push(message);
  }

  getMessages() {
    return this.messageQueue;
  }

  clearMessages() {
    this.messageQueue = [];
  }
}

// Use in tests
const transport = new MockTransport();
const server = new McpServer({
  name: 'test-server',
  version: '1.0.0',
});

// Connect the mock transport
await server.connect(transport);

// Now you can inspect messages sent by the server
const messages = transport.getMessages();

This lets you test the JSON-RPC message flow without spinning up HTTP or stdio. You can verify that the server sends well-formed responses, handles errors correctly, and respects the protocol state machine.

Testing session lifecycle and reconnects

Session state is the most common production failure. Here is how to test it:

describe('session state handling', () => {
  it('survives client reconnect', async () => {
    const server = new McpServer({
      name: 'test-server',
      version: '1.0.0',
    });

    // Tool that stores state
    const sessionData = new Map<string, any>();

    server.tool(
      'store_value',
      'Store a value in session',
      {
        key: z.string(),
        value: z.string(),
      },
      async ({ key, value }, { sessionId }) => {
        if (!sessionData.has(sessionId)) {
          sessionData.set(sessionId, {});
        }
        sessionData.get(sessionId)[key] = value;
        return { content: [{ type: 'text', text: 'stored' }] };
      }
    );

    // First connection
    const transport1 = new MockTransport();
    await server.connect(transport1);
    const sessionId = 'test-session-123';

    await server.callTool({
      name: 'store_value',
      arguments: { key: 'foo', value: 'bar' },
    });

    // Simulate disconnect
    await transport1.close();

    // Reconnect with same session ID
    const transport2 = new MockTransport();
    await server.connect(transport2);

    // State should still be there
    expect(sessionData.get(sessionId)).toEqual({ foo: 'bar' });
  });
});

This test exposes the problem: in-memory session state does not survive reconnects unless you explicitly persist it. The fix is to use a shared store (Redis, a database, or at minimum a file-backed cache).

Concurrent tool call handling

Agents can issue multiple tool calls in parallel. If your tool handler mutates shared state, you need locking or you will corrupt data:

describe('concurrent tool calls', () => {
  it('handles parallel calls without corruption', async () => {
    const server = new McpServer({
      name: 'test-server',
      version: '1.0.0',
    });

    let counter = 0;

    server.tool(
      'increment',
      'Increment a counter',
      {},
      async () => {
        const current = counter;
        // Simulate async work
        await new Promise(resolve => setTimeout(resolve, 10));
        counter = current + 1;
        return { content: [{ type: 'text', text: String(counter) }] };
      }
    );

    // Fire 10 concurrent calls
    const results = await Promise.all(
      Array(10).fill(null).map(() =>
        server.callTool({ name: 'increment', arguments: {} })
      )
    );

    // Without locking, counter will be less than 10
    expect(counter).toBe(10);
  });
});

This test will fail without proper locking. The fix is to use a mutex or atomic operations.

Testing HTTP transport specifics

If you use StreamableHTTPServerTransport, you need to test the HTTP contract:

import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamable-http.js';
import request from 'supertest';
import express from 'express';

describe('HTTP transport', () => {
  it('handles POST tool calls', async () => {
    const app = express();
    const server = new McpServer({
      name: 'test-server',
      version: '1.0.0',
    });

    server.tool('ping', 'Ping', {}, async () => ({
      content: [{ type: 'text', text: 'pong' }],
    }));

    const transport = new StreamableHTTPServerTransport('/mcp', app);
    await server.connect(transport);

    const response = await request(app)
      .post('/mcp')
      .send({
        jsonrpc: '2.0',
        method: 'tools/call',
        params: { name: 'ping', arguments: {} },
        id: 1,
      });

    expect(response.status).toBe(200);
    expect(response.body.result.content[0].text).toBe('pong');
  });

  it('handles GET for SSE streaming', async () => {
    const app = express();
    const server = new McpServer({
      name: 'test-server',
      version: '1.0.0',
    });

    const transport = new StreamableHTTPServerTransport('/mcp', app);
    await server.connect(transport);

    const response = await request(app)
      .get('/mcp')
      .set('Accept', 'text/event-stream');

    expect(response.status).toBe(200);
    expect(response.headers['content-type']).toContain('text/event-stream');
  });
});

These tests verify that the HTTP layer correctly routes POST to tool calls and GET to SSE streams.

Failure mode comparison

Failure Mode	Unit Test Catches	Integration Test Catches	Contract Test Catches
Malformed JSON-RPC response	No	Sometimes	Yes
Schema drift between tool and client	No	No	Yes
Session state loss on reconnect	No	Yes	Yes
Concurrent call race conditions	No	Sometimes	Yes
Transport-specific bugs (HTTP vs stdio)	No	Yes	Yes
Tool parameter validation	Yes	Yes	Yes
Timeout handling	No	Yes	No

Contract tests sit between unit and integration tests. They validate the protocol boundary without requiring a full deployment.

Technical Verdict

Use contract tests for MCP servers when:

You expose tools to agents you do not control (external clients, multiple teams)
You need to catch schema drift before deployment
You run multiple instances or expect reconnects
Your tools mutate shared state

Skip them when:

You are prototyping and the tool interface is still changing rapidly
You have a single client and server in the same codebase (monorepo with shared types)
Your tools are pure functions with no state or side effects

The testing overhead is low. A contract test suite for a typical MCP server with five tools takes 30 minutes to write and runs in under a second. The payoff is catching production failures at build time instead of runtime.

If you deploy MCP servers to production, contract tests are not optional. They are the only way to validate the JSON-RPC boundary that TypeScript cannot check.

Source Links

How to test MCP servers in TypeScript before they break in production