HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools

Every Python function you deploy as an LLM tool lives in two places. You write a FastAPI route for human clients and CI pipelines. Then you write an MCP tool registration for Claude, Cursor, or any other agent runtime. The business logic is identical. The plumbing is not.

You maintain two sets of routing, validation, serialization, streaming handlers, and schema definitions. When the underlying function changes, both representations drift. HarnessAPI proposes a skill-first framework that generates both from a single typed definition, eliminating the duplication tax.

The Dual-Representation Problem

Agent tools today require parallel infrastructure:

HTTP endpoint: FastAPI route with OpenAPI schema, SSE streaming, CORS middleware, and Swagger UI for human testing.
MCP tool registration: FastMCP handler with JSON-RPC schema, tool metadata, and agent-runtime compatibility.

Both wrap the same Python function. Both need Pydantic validation. Both need to handle streaming responses. But the machinery diverges:

Concern	HTTP Stack	MCP Stack
Routing	Path-based, HTTP verbs	JSON-RPC method names
Streaming	Server-Sent Events (SSE)	Chunked JSON arrays or blocking
Schema	OpenAPI 3.x	MCP tool metadata + JSON Schema
Auth	Bearer tokens, API keys	Runtime-managed credentials
Testing	Swagger UI, curl	Agent sandbox or manual JSON-RPC

When you update a function signature, you patch both stacks. When you add a new skill, you write two registrations. The boilerplate compounds.

How HarnessAPI Unifies Them

HarnessAPI treats a typed skill folder as the single source of truth. You write one Python module with Pydantic schemas. The framework generates:

A streaming HTTP endpoint with SSE support.
An interactive OpenAPI/Swagger UI.
A zero-configuration MCP tool registration.

All three are served from a single FastAPI process.

Skill-First Registration

You define a skill as a typed function in a folder:

# skills/weather.py
from pydantic import BaseModel, Field

class WeatherRequest(BaseModel):
    city: str = Field(..., description="City name")
    units: str = Field("metric", description="Temperature units")

class WeatherResponse(BaseModel):
    temperature: float
    conditions: str

async def get_weather(request: WeatherRequest) -> WeatherResponse:
    # Business logic here
    return WeatherResponse(temperature=22.5, conditions="Cloudy")

HarnessAPI scans the folder, extracts Pydantic schemas, and registers the function as both an HTTP route and an MCP tool. No decorators. No duplicate schema definitions.

Dual-Mode Content Negotiation

The same handler serves SSE-streaming and JSON-returning clients. The framework inspects the Accept header:

text/event-stream: Returns SSE chunks for real-time updates.
application/json: Returns a single JSON response.

This eliminates the need for separate streaming and non-streaming endpoints. The agent runtime decides which mode to use based on its capabilities.

Dynamic Code Generation for MCP

FastMCP’s inspection layer requires explicit type annotations. Naive closure-based registration loses Pydantic metadata. HarnessAPI solves this with dynamic code generation:

Reads Pydantic schemas from the skill module.
Generates a wrapper function with explicit type hints.
Registers the wrapper with FastMCP, preserving all metadata.

This ensures the MCP tool schema matches the HTTP OpenAPI schema exactly.

Architecture

HarnessAPI subclasses FastAPI, so you inherit the full middleware, dependency injection, and deployment ecosystem. The skill loader runs at startup:

Scan: Walks the skill folder, imports Python modules.
Extract: Parses Pydantic schemas and function signatures.
Generate: Creates HTTP routes with SSE support and MCP tool registrations.
Serve: Mounts both on a single FastAPI app.

The framework runs in a single process. No separate MCP server. No inter-process communication. The agent runtime connects via HTTP or JSON-RPC depending on the protocol it supports.

Streaming Flow

For SSE streaming:

Client sends Accept: text/event-stream.
HarnessAPI wraps the skill function in an async generator.
Each yield becomes an SSE event.
The agent runtime buffers or processes chunks as they arrive.

For blocking JSON:

Client sends Accept: application/json.
HarnessAPI awaits the full response.
Returns a single JSON object.

The skill function does not change. The framework handles serialization based on the client’s capabilities.

Boilerplate Reduction

The paper measures six representative skills using cloc. HarnessAPI reduces framework-facing boilerplate by 74% compared to a manually maintained dual-stack implementation (FastAPI server + FastMCP server).

The savings come from:

No duplicate route definitions.
No separate schema files.
No manual SSE handler logic.
No explicit MCP tool registration code.

You write the skill once. The framework generates the rest.

Deployment Shape

HarnessAPI runs as a single FastAPI process. You deploy it like any FastAPI app:

Local: uvicorn main:app --reload
Container: Standard Dockerfile with CMD ["uvicorn", "main:app"]
Serverless: Wrap in Mangum for AWS Lambda or Google Cloud Run.

The MCP tool is available at /mcp by default. The HTTP endpoints are at /skills/{skill_name}. The Swagger UI is at /docs.

Security Boundaries

Authentication is a shared concern. HarnessAPI does not enforce a specific auth model, but it provides hooks:

HTTP: Use FastAPI dependency injection for Bearer tokens or API keys.
MCP: The agent runtime manages credentials. The tool receives pre-authenticated requests.

You can enforce different policies for each mode:

from fastapi import Depends, HTTPException
from harnessapi import HarnessAPI

app = HarnessAPI()

async def verify_api_key(api_key: str = Header(...)):
    if api_key != "secret":
        raise HTTPException(status_code=403)
    return api_key

@app.skill("weather", dependencies=[Depends(verify_api_key)])
async def get_weather(request: WeatherRequest) -> WeatherResponse:
    # Only HTTP clients with valid API key can call this
    pass

The MCP tool bypasses the dependency if the agent runtime is trusted. You control the boundary.

Likely Failure Modes

Schema drift: If you change a Pydantic model without restarting the server, the OpenAPI schema and MCP tool metadata will be stale. HarnessAPI does not hot-reload schemas.
Streaming timeouts: SSE connections can hang if the skill function blocks. Use async I/O and set appropriate timeouts in the client.
Type annotation loss: If you use dynamic typing or Any, the MCP tool schema will be incomplete. Stick to explicit Pydantic models.
Middleware conflicts: HarnessAPI subclasses FastAPI, so middleware order matters. Test thoroughly if you add custom middleware.

Observability

HarnessAPI inherits FastAPI’s logging and metrics. You can add structured logging or OpenTelemetry tracing:

from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

app = HarnessAPI()
FastAPIInstrumentor.instrument_app(app)

The framework does not provide built-in observability for MCP tool calls. You will need to log tool invocations manually if you want per-call metrics.

Technical Verdict

Use HarnessAPI when:

You are deploying multiple skills that need both HTTP and MCP interfaces.
You want to eliminate duplicate schema definitions and routing logic.
You are comfortable with FastAPI’s deployment model and middleware ecosystem.
You need SSE streaming for real-time agent interactions.

Avoid HarnessAPI when:

You need fine-grained control over MCP tool registration (custom metadata, versioning).
Your skills require separate authentication policies for HTTP and MCP that cannot be expressed with FastAPI dependencies.
You are running a polyglot stack and cannot consolidate on Python.
You need hot-reloading of skill schemas without restarting the server.

The framework is most valuable when you have more than three skills. Below that threshold, the manual dual-stack approach is simpler. Above it, the boilerplate savings compound quickly.