Self-Refining Topology Optimization: How LLM Agents Automate Engineering Design Decisions

Topology optimization generates material distributions that satisfy structural objectives and constraints. The numerical algorithms are well understood. The problem is that engineers must make dozens of decisions during the workflow: initial parameter values, when to adjust constraints, whether a converged design is physically feasible, and when to stop iterating. These decisions require domain expertise and block full automation.

TopOptAgents is a multi-agent system that automates both the design process and the decision-making loop. Six LLM-based agents collaborate through iterative self-refinement cycles that span problem formulation, validation, code generation, execution, and quality assessment. The agents correct errors and progressively improve both the optimization setup and the resulting design.

This is not code generation with a human in the loop. The agents manage the entire workflow, including decisions about when to re-run expensive finite element simulations versus tweaking parameters.

Agent Orchestration Flow

The system runs in cycles. Each cycle involves multiple agents with distinct responsibilities:

Problem Formulation Agent: Translates engineering requirements into optimization parameters (objective functions, constraints, boundary conditions).
Validation Agent: Checks formulation for physical feasibility and numerical stability before execution.
Code Generation Agent: Produces executable optimization code from the validated formulation.
Execution Agent: Runs the simulation and captures results, errors, and convergence metrics.
Quality Assessment Agent: Evaluates the optimized structure against criteria not explicitly encoded in the objective function (manufacturability, stress concentrations, geometric feasibility).
Refinement Coordinator: Decides whether to accept the design, adjust parameters, or reformulate the problem.

The coordinator is the control loop. It tracks optimization history across iterations and decides whether the next action is a parameter adjustment (cheap) or a full reformulation and re-simulation (expensive).

State Management Across Iterations

Each cycle generates structured state:

Current parameter values (filter radius, penalty exponent, volume fraction).
Convergence history (objective value trajectory, constraint violations).
Design quality metrics (compliance, stress distribution, geometric features).
Error logs from previous attempts.

The agents serialize this state into a compact representation that fits within the LLM context window. The refinement coordinator uses this history to avoid repeating failed parameter combinations and to recognize when diminishing returns signal convergence.

Without persistent state, agents would repeat the same mistakes. The system uses a JSON-based state schema that captures numerical results, qualitative assessments, and decision rationale from each cycle.

Tool Boundaries and Action Space

The agents do not directly manipulate finite element matrices or optimization algorithms. The tool boundary is drawn at the parameter level:

Agent Actions (Exposed to LLM):

Set numerical parameters (filter radius, penalty, convergence tolerance).
Adjust constraints (volume fraction, displacement limits).
Modify boundary conditions (load positions, support locations).
Trigger simulation execution.
Request design quality metrics.

Simulation Engine (Hidden from LLM):

Finite element assembly and solution.
Sensitivity analysis.
Optimization algorithm (SIMP, level set, etc.).
Mesh generation and refinement.

This separation keeps the agent action space manageable. The agents reason about high-level design decisions, not matrix operations. The simulation engine exposes a clean API that returns convergence metrics and design quality scores.

When to Re-Run vs. Adjust

The refinement coordinator uses heuristics to decide between cheap parameter adjustments and expensive re-simulations:

Parameter Adjustment (No Re-Simulation):

Convergence tolerance too tight (objective value oscillating).
Filter radius causing checkerboard patterns (increase slightly).
Penalty exponent too aggressive (reduce to improve convergence).

Full Re-Simulation Required:

Constraint violations in final design (volume fraction exceeded).
Physical infeasibility (disconnected structures, stress concentrations).
Objective function mismatch (optimizing for compliance but manufacturability is poor).

The coordinator tracks the cost of each decision. If three consecutive parameter adjustments fail to improve design quality, it triggers a reformulation. If reformulation fails twice, it escalates to a human review flag.

Failure Modes

Context Window Overflow: Long optimization runs with many iterations generate large state histories. The system compresses older cycles into summary statistics (mean objective value, constraint violation count) to stay within token limits.

Premature Convergence: The quality assessment agent may approve a design that meets numerical criteria but has hidden flaws (sharp corners that concentrate stress). The system mitigates this by running multiple quality checks with different prompts and requiring consensus.

Infinite Refinement Loops: Agents can get stuck adjusting parameters without improving the design. The coordinator enforces a maximum cycle count (default 10) and a minimum improvement threshold (1% objective value change).

Simulation Crashes: The execution agent captures error logs and passes them to the refinement coordinator. If the same error occurs twice, the coordinator reformulates the problem rather than retrying the same parameters.

Implementation Shape

class TopOptAgents:
    def __init__(self, llm_client, simulation_engine):
        self.formulation_agent = Agent(llm_client, role="formulation")
        self.validation_agent = Agent(llm_client, role="validation")
        self.code_gen_agent = Agent(llm_client, role="code_generation")
        self.execution_agent = Agent(llm_client, role="execution")
        self.quality_agent = Agent(llm_client, role="quality")
        self.coordinator = RefinementCoordinator(llm_client)
        self.sim_engine = simulation_engine
        self.state = OptimizationState()
    
    def run_cycle(self):
        # Formulate problem
        formulation = self.formulation_agent.generate(self.state)
        
        # Validate before execution
        validation = self.validation_agent.check(formulation)
        if not validation.passed:
            self.state.add_error(validation.issues)
            return self.coordinator.decide_next_action(self.state)
        
        # Generate and execute code
        code = self.code_gen_agent.generate(formulation)
        result = self.execution_agent.run(code, self.sim_engine)
        
        # Assess quality
        quality = self.quality_agent.evaluate(result.design)
        self.state.add_cycle(formulation, result, quality)
        
        # Decide next action
        return self.coordinator.decide_next_action(self.state)
    
    def optimize(self, requirements, max_cycles=10):
        self.state.initialize(requirements)
        for cycle in range(max_cycles):
            action = self.run_cycle()
            if action == "accept":
                return self.state.final_design
            elif action == "reformulate":
                self.state.reset_parameters()
            # Continue refinement
        raise MaxCyclesExceeded()

The coordinator’s decision logic is the critical path. It must balance exploration (trying new parameter combinations) with exploitation (refining promising designs).

Observability and Debugging

Each cycle logs:

Agent prompts and responses (for debugging reasoning errors).
Simulation execution time and memory usage.
Convergence metrics (objective value, constraint violations).
Quality assessment scores and rationale.
Coordinator decision and justification.

The logs are structured JSON, not free text. This allows automated analysis of agent behavior patterns. For example, tracking how often the quality agent rejects designs that meet numerical criteria reveals gaps in the objective function formulation.

Performance Characteristics

Metric	Value	Notes
Cycles per design	3-7	Well-covered problem classes converge faster
Simulation time per cycle	2-15 min	Depends on mesh size and algorithm
LLM calls per cycle	8-12	One per agent plus coordinator decisions
Context window usage	15-40k tokens	Grows with cycle count, compressed after cycle 5
Success rate (converged design)	78%	Measured on 50 test problems
Human intervention rate	12%	Triggered by max cycles or repeated failures

The system performs best on problem classes with sparse literature coverage. When the pretrained LLM has limited prior exposure to a formulation, the iterative refinement loop compensates by learning from simulation feedback.

Security Boundaries

The code generation agent produces executable Python. This creates a code injection risk. Mitigations:

Sandbox execution environment with no network access.
Whitelist of allowed imports (numpy, scipy, simulation engine API).
Static analysis of generated code before execution.
Resource limits (CPU time, memory, disk I/O).

The agents do not have file system access outside the sandbox. Simulation results are passed back through a structured API, not by reading output files.

Deployment Shape

The system runs as a long-lived service with a job queue:

Engineer submits optimization requirements via API.
Job enters queue with priority and resource allocation.
Worker picks up job and initializes agent system.
Agents run refinement cycles until convergence or max cycles.
Final design and optimization history returned to engineer.

Each job runs in an isolated container with its own simulation engine instance. This prevents resource contention and allows parallel execution of multiple optimization jobs.

Technical Verdict

Use this approach when:

You have expensive numerical simulations that require expert parameter tuning.
The problem formulation is well-defined but the optimal parameters are not.
You need to explore design spaces where human intuition is unreliable.
You can tolerate 10-20% of jobs requiring human review.

Avoid this approach when:

Simulation runtime exceeds 30 minutes per cycle (feedback loop too slow).
The problem formulation is ambiguous or changes frequently (agents cannot learn stable patterns).
You need guaranteed convergence (the system is probabilistic, not deterministic).
Your simulation engine does not expose a clean parameter API (agents need structured tool boundaries).

The key insight is that agents can manage iterative refinement loops if you give them structured state, clear tool boundaries, and a coordinator that balances exploration with exploitation. The system works because topology optimization has well-defined success criteria (convergence, constraint satisfaction, quality metrics). Applying this pattern to less structured engineering domains will require more sophisticated quality assessment agents.