The open versus closed model debate is becoming infrastructure noise. When LLaMA variants perform within 2% of GPT-4 on most benchmarks and edge devices can run 7B parameter models at acceptable latency, the bottleneck shifts from model selection to workflow plumbing. The Practical AI podcast frames this transition clearly: agentic systems, workflows, and AI-driven infrastructure now define deployment success more than the model provider you choose.
The engineering problems change when models commoditize. You are no longer optimizing prompt templates and comparing API response times. You are building state machines that coordinate multiple agents, persist intermediate results, handle partial failures, and expose observability hooks across heterogeneous execution environments.
What Agentic Infrastructure Actually Means
Agentic systems are not chatbots with function calling. They are multi-step workflows where each step may invoke a model, call an external API, wait for human approval, or trigger another agent. The infrastructure must handle coordination across these steps, which introduces challenges that single-model inference never encounters.
The model itself becomes a swappable component. The coordination layer is where the complexity lives: how do you pass context between agents without losing state? How do you retry a failed tool call without duplicating side effects? How do you observe causality when execution spans edge devices and cloud services?
Edge vs. Cloud Deployment Trade-offs
Physical AI and edge devices introduce new constraints. A manufacturing floor agent running on an NVIDIA Jetson cannot rely on round-trip API calls to OpenAI. It needs local inference, local state, and local tool execution. But it also needs to sync results back to a central orchestrator for audit trails and cross-site coordination.
| Deployment Mode | State Management | Tool Execution | Failure Recovery | Observability |
|---|---|---|---|---|
| Cloud-only | Centralized DB, strong consistency | API calls with retries | Replay from checkpoints | Full trace aggregation |
| Edge-only | Local SQLite or embedded KV | Direct hardware/API access | Limited replay, manual intervention | Local logs, periodic sync |
| Hybrid | Distributed state with sync protocol | Mixed (local + remote) | Partial replay, conflict resolution | Federated tracing with gaps |
Hybrid deployments are the most complex. You need a state synchronization protocol that handles network partitions, a tool registry that knows which tools are available locally versus remotely, and an observability stack that stitches together traces from edge nodes and cloud services.
Workflow Orchestration Primitives
When models commoditize, the orchestration layer becomes the differentiation point. You need primitives that handle:
- Agent handoff: Passing context and control from one agent to another without losing state.
- Conditional branching: Routing based on tool call results or model outputs.
- Human-in-the-loop gates: Pausing execution for approval and resuming with updated context.
- Parallel execution: Running multiple agents concurrently and merging results.
- Retry and backoff: Handling transient failures without restarting the entire workflow.
These primitives exist in workflow engines like Temporal and Prefect, but they were not designed for agentic workloads. You end up writing custom adapters to serialize agent state, map tool calls to activities, and handle model-specific error codes.
Code Example: State Persistence for Agent Handoff
Here is a minimal example of persisting agent state between handoffs using a key-value store. This avoids losing context when an agent crashes or when execution moves from edge to cloud.
import json
import redis
class AgentStateStore:
def __init__(self, redis_client):
self.redis = redis_client
def save_state(self, workflow_id, agent_id, state):
key = f"workflow:{workflow_id}:agent:{agent_id}"
self.redis.set(key, json.dumps(state), ex=3600)
def load_state(self, workflow_id, agent_id):
key = f"workflow:{workflow_id}:agent:{agent_id}"
data = self.redis.get(key)
return json.loads(data) if data else None
def handoff(self, workflow_id, from_agent, to_agent, context):
# Save current agent state
self.save_state(workflow_id, from_agent, context)
# Load next agent state (if resuming)
next_state = self.load_state(workflow_id, to_agent)
if next_state:
context.update(next_state)
return context
# Usage
store = AgentStateStore(redis.Redis(host='localhost', port=6379, db=0))
context = {"user_query": "analyze sales data", "partial_results": [...]}
context = store.handoff("wf-123", "analyst-agent", "report-agent", context)
This pattern works for simple handoffs but breaks down when you need transactional guarantees or when agents run on edge devices without Redis access. You need a state sync protocol that handles offline writes and conflict resolution.
Open vs. Closed Models in Agentic Context
The open versus closed debate still matters for three specific reasons:
- Latency and cost at scale: Running LLaMA locally eliminates API latency and per-token costs for high-throughput workflows.
- Data residency: Regulated industries cannot send data to third-party APIs, so local models are required.
- Customization: Fine-tuning open models for domain-specific tool use is easier than prompt engineering closed models.
But these are deployment constraints, not model capability gaps. If your workflow can tolerate 200ms API latency and you are not in a regulated industry, the model provider is irrelevant. The orchestration layer is what determines reliability, debuggability, and operational cost.
Observability for Multi-Agent Workflows
Traditional APM tools do not handle agentic workflows well. You need to trace causality across:
- Model inference calls (local or remote)
- Tool executions (APIs, database queries, hardware commands)
- Agent handoffs (state transitions)
- Human approval gates (pauses and resumes)
OpenTelemetry can capture this if you instrument every agent and tool call, but you still need a custom trace aggregator that understands agent semantics. You want to answer questions like “why did the workflow retry three times?” and “which tool call caused the agent to switch strategies?”
A minimal observability stack for agentic workflows includes:
- Structured logging with workflow ID, agent ID, and step ID in every log line
- Distributed tracing with spans for each agent execution and tool call
- State snapshots at every handoff and retry
- Metrics for agent execution time, tool call success rate, and retry count
Failure Modes and Recovery Strategies
Agentic workflows fail in ways that single-model inference does not. Common failure modes:
- Tool call schema mismatch: The model generates a tool call that does not match the expected schema.
- Partial execution: An agent completes some steps but crashes before finishing.
- State desync: Edge and cloud state diverge due to network partition.
- Infinite retry loops: A transient error triggers retries that never succeed.
Recovery strategies depend on the failure mode:
| Failure Mode | Detection Method | Recovery Strategy |
|---|---|---|
| Schema mismatch | Validate tool call before execution | Retry with schema hint in prompt |
| Partial execution | Checkpoint after each step | Resume from last checkpoint |
| State desync | Compare checksums on sync | Merge with conflict resolution rules |
| Infinite retry | Count retries, set max limit | Escalate to human or fallback agent |
You need to design these recovery paths upfront. Retrofitting them after deployment is expensive because you have to migrate state schemas and update agent logic.
Technical Verdict
Use agentic infrastructure when:
- You have multi-step workflows that require coordination between multiple models or tools.
- You need to run agents on edge devices with intermittent connectivity.
- You are building systems where model choice may change frequently (vendor lock-in avoidance).
- You need audit trails and observability for regulated or high-stakes decisions.
Avoid agentic infrastructure when:
- Your use case is a single-shot model inference with no state or tool calls.
- You do not have the engineering capacity to build and maintain orchestration, state management, and observability layers.
- Your workflow is still experimental and you have not validated that multi-step coordination provides value over simpler approaches.
The shift from model wars to workflow plumbing is not hype. It is a practical engineering reality. The model you choose matters less than the infrastructure you build around it. When LLaMA and GPT-4 converge in capability, the differentiation comes from how you orchestrate, observe, and recover from failures in multi-agent systems.