mech.app
Financial

Neural Pub/Sub: Federated Market Brokers for Edge-Cloud Agent Coordination

How self-organizing broker networks use price signals and polymatroidal feasibility regions to coordinate agents across administrative domains.

Source: arxiv.org
Neural Pub/Sub: Federated Market Brokers for Edge-Cloud Agent Coordination

When your agents span multiple clouds, edge clusters, and administrative domains, a single orchestrator becomes a bottleneck and a sovereignty violation. Neural Pub/Sub replaces central control with a federated network of brokers that coordinate through market price signals. Each broker runs a MAPE-K control loop (Monitor, Analyze, Plan, Execute, Knowledge) and negotiates resource allocation using marginal-cost clearing prices instead of global state.

The system addresses a real problem: edge-cloud continuum workloads must honor data sovereignty constraints while competing for scarce edge compute. A centralized scheduler cannot see inside every domain, and cross-domain API calls introduce latency that kills real-time placement decisions.

Architecture: Federated Brokers and Price Discovery

Neural Pub/Sub organizes brokers into a peer network. Each broker:

  • Monitors local worker health, load, and capacity.
  • Analyzes marginal costs for each service type.
  • Plans placement over a polymatroidal feasibility region (a convex hull that respects dependency DAGs and resource constraints).
  • Executes cross-domain dispatch when local capacity is exhausted.
  • Shares subscription summaries and bounded-staleness price signals with peers.

Brokers discover each other through a gossip protocol. When a broker receives a service request it cannot satisfy locally, it queries peers for their current clearing prices. The requesting broker then routes the workload to the peer offering the lowest marginal cost, subject to data sovereignty rules.

Polymatroidal Feasibility Region

The Plan step uses a polymatroid to model resource constraints. A polymatroid is a set function that satisfies submodularity: adding a resource to a smaller set yields at least as much marginal benefit as adding it to a larger set. This property guarantees that greedy allocation converges to an optimal solution when service dependencies form a tree or series-parallel DAG.

The broker constructs a feasibility polytope where each vertex represents a valid allocation of workers to services. The paper uses a greedy matroid optimization algorithm that iteratively selects the service with the highest marginal value-to-cost ratio, subject to polymatroidal rank constraints. This greedy approach is optimal for gross-substitutes valuations on acyclic dependency graphs.

Walrasian Convergence and Gross Substitutes

Under gross-substitutes valuations on tree and series-parallel service-dependency DAGs, decentralized price-based allocation matches the welfare of a centralized oracle. For arbitrary DAGs with cycles, convergence is not guaranteed, and the system may oscillate or settle into a suboptimal equilibrium.

Gross substitutes means that if service A becomes expensive, agents shift demand to service B. This property holds when service dependency DAGs are trees or series-parallel graphs. The paper anchors its correctness claim in Walrasian equilibrium theory: when these structural conditions are met, the federated market reaches the same welfare outcome as a centralized planner.

The MAPE-K loop enforces this by:

  1. Computing local clearing prices using supply and demand curves.
  2. Broadcasting price updates to peers with bounded staleness (typically 100-500 ms).
  3. Adjusting placement plans when remote prices change.

Bounded staleness prevents infinite loops. If a broker sees a stale price, it may over-allocate to a remote domain, but the next control cycle corrects the error.

Data Sovereignty Constraints

Each broker maintains a policy map that specifies which services can run in which administrative domains. When planning placement, the broker filters the feasibility region to exclude forbidden allocations.

Example policy structure (not from paper):

service: user-profile-inference
allowed_domains:
  - eu-west-1
  - eu-central-1
forbidden_domains:
  - us-east-1
data_residency: GDPR

The broker encodes these constraints as additional inequalities in the polymatroidal feasibility region. If no feasible allocation exists, the broker rejects the request and logs a sovereignty violation.

Cross-domain dispatch respects these rules by checking the destination broker’s domain label before forwarding a workload. If the destination domain is forbidden, the broker skips it and queries the next peer.

Failure Modes and Race Conditions

The paper evaluates a 4-VM, 4-domain, 48-worker testbed with 50 ms emulated WAN latency. Three failure modes emerged:

Failure ModeTriggerMitigationSource
Price oscillationRapid demand shifts on cyclic DAGsDamping factor in price update (0.7x previous + 0.3x new)Paper (explicit)
Double allocationTwo brokers dispatch to the same worker simultaneouslyOptimistic locking with retry on conflictPaper (Section 4.2)
Stale price routingBroker routes to a peer whose price increased 200 ms agoBounded staleness timeout (500 ms) forces re-queryPaper (explicit)

Double allocation is the most common. When two brokers see the same low price from a peer, both may dispatch workloads within the same 100 ms window. The receiving broker detects the conflict when it tries to reserve the worker and returns a NACK to one sender. The losing broker retries with the next-cheapest peer.

Price oscillation occurs when services form a cycle in the dependency DAG. The paper’s Walrasian convergence proof assumes acyclic graphs, so cyclic dependencies can cause brokers to chase each other’s price updates. The damping factor smooths these oscillations but does not eliminate them.

Performance: Federated vs. Centralized

The testbed ran 1005 experiments comparing federated brokers to a single-process oracle and a four-shard centralized orchestrator. Key results:

  • Federated market beats single-process oracle by 2-4% (median latency reduction of 39.6 ms, p ~ 2.8e-14).
  • Federated market matches four-shard orchestrator within +/-1.5% across all load levels.
  • Round-robin baseline collapses from 98.8% completion rate at 5 requests/sec to 3.3% at 15 requests/sec.

The federated market wins because it parallelizes placement decisions. Each broker plans independently, so the system scales horizontally. The single-process oracle serializes all decisions, creating a bottleneck. The four-shard orchestrator matches federated performance because it also parallelizes, but it requires a shared state store (etcd or Consul) that introduces coordination overhead.

Round-robin fails because it ignores load and capacity. When arrival rate exceeds aggregate capacity, round-robin spreads requests evenly across brokers, causing all of them to queue. The market mechanism routes requests to brokers with spare capacity, avoiding head-of-line blocking.

Observability: What to Monitor

The following metrics are recommended based on the architecture (not exhaustively specified in the paper). Each broker should expose:

  • broker_clearing_price{service}: Current marginal cost per service type. Tracks market equilibrium; sudden spikes indicate capacity exhaustion or demand shocks.
  • broker_dispatch_latency_ms: Time from request arrival to worker assignment. Values above 200 ms suggest network congestion or peer query timeouts.
  • broker_cross_domain_hops: Number of peer queries before placement. High hop counts (>3) indicate poor price discovery or fragmented capacity.
  • broker_sovereignty_violations: Requests rejected due to policy constraints. Non-zero values require policy review or capacity expansion in compliant domains.
  • broker_allocation_conflicts: Double-allocation retries. Rates above 5% per minute indicate bounded staleness is too loose; increase STALENESS_BOUND_MS.

The federated network should expose aggregate metrics:

  • network_price_staleness_ms: Age of the oldest price signal in circulation. Values above 1000 ms suggest gossip propagation delays or network partitions.
  • network_convergence_time_ms: Time for all brokers to agree on clearing prices after a demand shock. Convergence times above 5 seconds indicate oscillation or cyclic DAGs.

Watch broker_allocation_conflicts and network_price_staleness_ms. High conflict rates indicate that bounded staleness is too loose. High staleness indicates that gossip propagation is too slow or that network partitions are occurring.

Deployment Shape

Neural Pub/Sub runs as a sidecar container alongside your existing orchestrator (Kubernetes, Nomad, or custom). Each broker connects to a local worker pool and a gossip mesh. The following is an inferred integration pattern, as the paper’s testbed used bare VMs:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: neural-pubsub-broker
spec:
  template:
    spec:
      containers:
      - name: broker
        image: neural-pubsub:v1.0
        env:
        - name: GOSSIP_PEERS
          value: "broker-1.example.com,broker-2.example.com"
        - name: DOMAIN_LABEL
          value: "eu-west-1"
        - name: STALENESS_BOUND_MS
          value: "500"
        ports:
        - containerPort: 8080  # gRPC API
        - containerPort: 7946  # Gossip

Each broker needs:

  • gRPC endpoint for service requests and cross-domain dispatch.
  • Gossip port for peer discovery and price propagation.
  • Worker API to query local capacity and dispatch workloads.

The broker does not store persistent state. It reconstructs its view of the network from gossip messages and local monitoring. If a broker crashes, peers detect the failure within one gossip interval (typically 1-5 seconds) and stop routing requests to it.

Security Boundaries

Federated brokers cross administrative boundaries, so authentication and authorization are critical. The paper describes:

  • Mutual TLS between brokers using domain-specific certificates.
  • JWT tokens for service requests, signed by the originating domain’s identity provider.
  • Policy enforcement at the receiving broker, which checks the token’s claims against its sovereignty map.

A malicious broker can lie about its clearing prices to attract more traffic. The paper does not address this attack. In production, consider adding:

  • Price attestation: Each broker signs its price updates with its domain certificate.
  • Reputation scoring: Brokers track peer reliability and penalize peers that frequently return NACKs or violate sovereignty policies.
  • Rate limiting: Each broker caps the number of cross-domain requests it accepts per second.

Technical Verdict

Use Neural Pub/Sub if your service dependency DAG is acyclic (tree or series-parallel) and you need provable optimality under gross-substitutes valuations. The polymatroidal feasibility region and Walrasian convergence proof guarantee that decentralized price-based allocation matches centralized welfare when these conditions hold. Deploy it when data sovereignty constraints prevent centralized orchestration and when workload arrival rates exceed 1000 requests/sec per domain.

Avoid it if your service dependencies form arbitrary DAGs with cycles. The convergence proof breaks, and you will see price oscillations that the damping factor cannot fully suppress. Also avoid it if you need transactional placement guarantees or strict consistency. Federated markets trade consistency for availability, so double allocations and stale price routing will occur under load.

The architecture shines in multi-cloud edge deployments where administrative boundaries are hard and latency budgets are tight. If you can centralize orchestration or tolerate eventual consistency, the paper’s results suggest prototyping with 2-3 brokers and measuring network_convergence_time_ms under realistic load. If convergence stays below your SLA and broker_allocation_conflicts remains under 5%, scale out. Otherwise, fall back to a sharded centralized orchestrator with domain-aware routing.


Tags

agentic-ai orchestration infrastructure

Primary Source

arxiv.org