Federated Imputation: How Agents Fill Missing Data Across Misaligned Feature Schemas

Federated learning works when every client speaks the same language. Bank A trains on [credit_score, income, debt_ratio]. Bank B trains on the same three columns. You average gradients, ship a global model, and everyone benefits.

Real tabular data doesn’t cooperate. Bank A tracks [credit_score, income]. Bank B has [income, debt_ratio, employment_length]. Bank C logs [credit_score, employment_length, zip_code]. Parameter averaging (FedAvg) transfers almost nothing when feature overlap is sparse or nonexistent. You end up with a model that ignores half the signal in the consortium.

A new paper from ArXiv (2605.16099v1, May 2026) tackles this: FedHF-Impute separates structural unavailability (columns that don’t exist at a client) from conventional missingness (nulls in columns that do exist). It builds a shared feature graph across clients and uses message passing to propagate information through statistical relationships, even when no single client observes two features together.

This is infrastructure for cross-bank fraud detection, multi-institution credit scoring, and any federated ML pipeline where schemas drift or were never aligned in the first place.

The Schema Alignment Problem

Standard federated learning assumes a global feature vector. Every client computes gradients for the same parameters. When Client A has features [1, 2, 3] and Client B has [2, 3, 4], FedAvg can’t reconcile the weight updates for feature 1 (only A sees it) or feature 4 (only B sees it). You get three failure modes:

Zero-padding: Treat missing features as zeros. Gradients for unavailable features stay at zero, so those weights never update at clients that lack the column. The global model learns nothing about feature 4 from Client A.
Feature subsetting: Train separate models per feature subset. You lose cross-feature interactions and end up with a model zoo instead of a unified predictor.
Schema enforcement: Force all clients to adopt the union schema. Clients without a column report it as missing (null). Standard imputation methods (mean fill, KNN) run locally, but they can’t leverage cross-client patterns because data never leaves the client.

FedHF-Impute introduces a fourth path: a global feature graph that encodes statistical dependencies (correlations, mutual information) between features, even when those features are never co-observed at a single client.

Architecture: Feature Graphs and Message Passing

The core idea is to treat imputation as a graph problem. Nodes are features. Edges are statistical relationships. Message passing propagates information from observed features to missing ones, using the global graph as a routing table.

Training Flow

Local schema registration: Each client reports its available features to the coordinator. The coordinator builds a union schema and a bipartite mapping (client → feature subset).
Graph construction: The coordinator initializes a feature graph. Edges can be learned (via correlation on overlapping features across clients) or seeded with domain knowledge (e.g., income and credit_score are related in financial data).
Local imputation: Each client runs a graph neural network (GNN) locally. For missing features, the GNN aggregates messages from observed neighbors in the feature graph. The aggregation weights come from the global graph structure, which is synchronized across clients.
Federated update: Clients compute gradients on imputed data and send updates to the coordinator. The coordinator averages gradients (standard FedAvg) and updates the global feature graph and model weights.
Graph refinement: After each round, the coordinator refines edge weights in the feature graph based on imputation error signals from clients. Edges that don’t improve imputation accuracy get pruned.

Message Passing Mechanics

The GNN operates on the feature graph, not the data graph. For a missing feature f_missing at Client A:

# Pseudocode for feature-level message passing
def impute_feature(f_missing, observed_features, feature_graph, embeddings):
    neighbors = feature_graph.get_neighbors(f_missing)
    messages = []
    for f_neighbor in neighbors:
        if f_neighbor in observed_features:
            edge_weight = feature_graph.get_edge_weight(f_missing, f_neighbor)
            message = edge_weight * embeddings[f_neighbor]
            messages.append(message)
    
    if len(messages) == 0:
        # No observed neighbors: fall back to global prior
        # Returns mean across embedding dimension (scalar or vector depending on embedding shape)
        return embeddings[f_missing].mean(axis=0)
    
    # Aggregate messages (sum, mean, attention, etc.)
    aggregated = sum(messages) / len(messages)
    return aggregated

The key insight: f_missing and f_neighbor might never appear together at Client A, but if Client B observes both and Client C observes f_neighbor and another feature that Client A has, the graph creates a transitive path for information flow.

Differential Privacy Constraints

In federated settings, clients often require differential privacy (DP). FedHF-Impute adds noise to gradients before aggregation (standard DP-SGD). The feature graph itself is public (it only encodes statistical relationships, not raw data), so it doesn’t need DP protection. However, edge weights derived from client data do. The paper uses a noisy correlation estimator to compute edge weights under DP.

The trade-off: stronger DP guarantees (lower epsilon) degrade edge weight accuracy, which reduces imputation quality. The paper reports a 12% RMSE increase when moving from epsilon=10 to epsilon=1 on the SECOM dataset.

Experimental Results

The paper tests FedHF-Impute on three datasets with simulated partial schema overlap:

Dataset	Clients	Features	Overlap %	FedHF-Impute RMSE	FedAvg RMSE	Improvement
SECOM	5	590	40%	0.142	0.194	26.9%
AirQuality	4	13	60%	0.089	0.097	8.4%
PhysioNet	8	37	70%	0.053	0.052	-0.3%

SECOM (semiconductor manufacturing) has high-dimensional, sparse features. AirQuality has moderate overlap. PhysioNet (ICU patient data) has high overlap, so FedAvg already works reasonably well. FedHF-Impute shines when overlap is low and feature relationships are strong.

The paper also compares against local imputation (KNN, mean fill) and centralized imputation (oracle with full data). FedHF-Impute closes 60-70% of the gap between local and centralized methods without violating federated constraints.

Failure Modes

Weak feature relationships: If the feature graph has no strong edges (low correlation, low mutual information), message passing degrades to random noise. The paper shows a 40% RMSE increase when edge weights are randomized.
Schema drift: If clients add or remove features mid-training, the global graph needs recomputation. The paper doesn’t address online schema updates. You’d need a versioning layer (feature registry with schema hashes) and a graph diff protocol.
Adversarial clients: A malicious client can poison the feature graph by reporting false correlations. The paper assumes honest-but-curious clients. Defending against poisoning requires robust aggregation (e.g., Krum, trimmed mean) on edge weight updates.
Cold start: New clients with entirely disjoint features can’t benefit from the graph until they share at least one feature with the existing graph. In financial consortiums, pre-seed edges between income and credit_score (typical correlation 0.6 from public datasets like FICO research) before clients join. This reduces bootstrap time from 50 rounds to roughly 10 rounds in practice.

Deployment Shape

A production federated imputation pipeline needs:

Feature registry: Central service that tracks schema versions, feature metadata (type, range, nullability), and client-to-feature mappings. Clients register schemas on join and report schema changes.
Graph coordinator: Maintains the global feature graph, computes edge weights from aggregated statistics, and broadcasts graph updates to clients. Runs as a stateful service (not a lambda) because the graph is mutable. For 1000+ features, prune edges below correlation threshold 0.2 to reduce coordinator memory from O(n²) to O(n·k) where k is average degree (typically 10-20 in financial feature spaces).
Client runtime: Embeds the GNN imputation model, fetches the latest feature graph from the coordinator, and runs local training rounds. Needs GPU for large graphs (1000+ features).
Observability: Track imputation RMSE per feature, per client. Monitor graph edge weight drift (sudden changes indicate schema drift or data quality issues). Log DP budget consumption (epsilon, delta) per round.

Example stack:

Feature registry: PostgreSQL with JSONB columns for schema versioning
Graph coordinator: Python service with NetworkX for graph ops, Redis for graph state
Client runtime: PyTorch with PyG (PyTorch Geometric) for GNN inference
Orchestration: Kubernetes with StatefulSets for coordinator, DaemonSets for clients
Observability: Prometheus for metrics, Grafana for dashboards, OpenTelemetry for traces

When to Use This

FedHF-Impute makes sense when:

You have multiple data sources (clients) with overlapping but misaligned schemas.
Feature relationships are strong enough to support transitive imputation (empirical guidance: look for correlation above 0.3 or mutual information above 0.1 bits in your domain).
You can’t centralize data due to privacy, regulatory, or competitive constraints.
You need a unified model (not a model zoo) that uses all available features.

Skip it when:

Schemas are fully aligned. Use standard FedAvg.
Feature overlap is below 20%. The graph won’t have enough paths for message passing.
You can afford centralized training. Centralized imputation is simpler and more accurate.
Your features are independent (e.g., one-hot encoded categorical with no interactions). Message passing adds overhead without benefit.

Technical Verdict

FedHF-Impute closes the 27% RMSE gap on high-dimensional sparse schemas where standard FedAvg fails. The feature graph abstraction is elegant, and message passing works when feature relationships are strong. The improvement on SECOM (semiconductor manufacturing with 590 features at 40% overlap) is meaningful for production ML in financial consortiums and cross-institution pipelines.

The weak points are schema drift (no online updates), adversarial robustness (no poisoning defense), and cold start (new clients need at least one shared feature). You’ll need to build a feature registry and graph versioning layer on top of the paper’s core algorithm.

Use this when you’re building cross-institution ML pipelines (fraud detection consortiums, credit scoring networks, healthcare data sharing) and schema alignment is politically or technically infeasible. Skip it if you can enforce a global schema or if your features are weakly correlated.