Most enterprise software still lives in desktop applications with no API surface. When you need an AI agent to interact with legacy Windows apps, you face a hard infrastructure problem: how do you orchestrate hundreds of concurrent desktop sessions, recover from UI changes, and debug failures that only reproduce on specific screen resolutions?
Minicor is a YC-backed platform that runs desktop RPA (Robotic Process Automation) at scale. The interesting parts are not the AI wrapper but the plumbing required to make desktop automation reliable enough for production workloads.
The Desktop Automation Problem
Traditional RPA scripts break when:
- A vendor ships a UI update and button coordinates shift
- An unexpected dialog appears mid-workflow
- Screen resolution differs between dev and production VMs
- Windows updates change accessibility tree structure
- Network latency causes elements to load out of order
Most RPA tools solve this with brittle XPath selectors or pixel-perfect image matching. Both fail frequently. Minicor’s approach uses a reflection agent that validates every action against the actual screen state before proceeding.
Architecture: From API Call to Desktop Session
Here’s the execution flow when you trigger a desktop automation:
- API ingress: Your system calls Minicor’s REST API with workflow ID and parameters
- Session allocation: Desktop service assigns a Windows VM from the pool
- Agent injection: Desktop agent process starts inside the VM session
- Workflow execution: Agent reads workflow definition, executes steps, validates each action
- State capture: Full video recording, screenshots, and execution context saved
- Cleanup: Session released back to pool or destroyed
The desktop service manages VM lifecycle. It handles:
- Pre-warming sessions to reduce cold-start latency
- Health checks to detect stuck or crashed agents
- Automatic failover when a VM becomes unresponsive
- License management for Windows Server instances
Session Isolation Strategies
Running hundreds of concurrent Windows desktop sessions creates three hard problems:
VM sprawl: Each session needs its own Windows instance. Minicor supports both dedicated VMs and Citrix environments. For cloud deployments, they use Windows Server with Remote Desktop Services to run multiple isolated sessions on a single host.
Licensing: Windows Server Datacenter edition allows unlimited VMs on licensed hardware. For cloud, you pay per-core. The trade-off is cost vs. isolation strength.
State contamination: Desktop apps often write to registry, temp files, or shared directories. Minicor uses session-scoped user profiles that get wiped between runs. For apps that require persistent state, you can mount shared volumes with read-only base images.
| Isolation Strategy | Cost | Startup Time | Failure Blast Radius |
|---|---|---|---|
| Dedicated VM per session | High | 30-60s | Single session |
| RDS multi-session | Medium | 5-10s | Single session |
| Citrix published apps | Low | 2-5s | Shared app pool |
Element Detection: UI Automation vs. Vision
Minicor uses multiple detection strategies in a fallback chain:
UI Automation API (first choice): Windows provides IUIAutomation for accessibility. You get a structured tree of UI elements with properties like name, role, and bounding box. Fast and reliable when it works. Fails when apps use custom controls or render to canvas.
OCR fallback: When UI Automation can’t find an element, Minicor runs OCR on the screen region. Slower (200-500ms per region) but works on any visible text. Accuracy depends on font rendering and contrast.
Computer vision (last resort): For graphical elements with no text, you can use template matching or ML-based object detection. Latency jumps to 1-2 seconds. Brittle across different screen resolutions.
The reflection agent validates after each action. If you click a button and the expected screen state doesn’t appear, the agent retries with a different detection method or raises an error.
State Recovery and Checkpointing
Desktop automations fail mid-workflow. The question is whether you restart from the beginning or resume from a checkpoint.
Minicor’s approach:
- Idempotent actions: Workflows should be safe to replay. If you’re filling a form, check if the field is already populated before typing.
- Explicit checkpoints: Mark steps as checkpoints in the workflow definition. On failure, the agent restarts from the last checkpoint.
- Session snapshots: For long-running workflows, you can snapshot the VM state. Expensive in storage (5-10 GB per snapshot) but useful for debugging.
Here’s a workflow snippet showing checkpoint usage:
workflow:
- step: open_app
action: launch
path: "C:\\Program Files\\LegacyApp\\app.exe"
- step: login
action: type
element: {name: "Username"}
value: "{{username}}"
checkpoint: true
- step: navigate_to_orders
action: click
element: {name: "Orders"}
- step: create_order
action: click
element: {name: "New Order"}
checkpoint: true
- step: fill_patient_id
action: type
element: {name: "Patient ID"}
value: "{{patient_id}}"
validate:
screen_contains: "Patient: {{patient_name}}"
If fill_patient_id fails, the agent restarts from create_order instead of re-logging in.
Observability Primitives
Debugging desktop automation requires more than logs. Minicor captures:
- Full video recording: Every session is recorded at 5 fps. Storage cost is ~50 MB per hour.
- Screenshot on error: When a step fails, the agent captures the screen and UI Automation tree.
- Execution trace: Timestamped log of every action, element detection attempt, and validation result.
- Slack alerts: Failures trigger notifications with video replay link and error context.
The video replay is the most useful debugging tool. You can see exactly what the agent saw when it failed, including transient dialogs or loading spinners that don’t appear in logs.
Deployment Shapes
Minicor supports three deployment models:
Cloud-hosted: Minicor manages the Windows VMs. You call the API, they handle scaling. Simplest but highest cost and potential data residency issues.
On-premise: You run the desktop service on your own Windows Server infrastructure. Minicor provides the agent binary and orchestration layer. Requires Windows Server licenses and VM management expertise.
Hybrid: Desktop agents run on-premise, but workflow definitions and observability data flow to Minicor’s cloud. Useful when you need to automate apps that can’t leave your network but want managed observability.
Likely Failure Modes
Windows updates: Automatic updates can restart VMs mid-workflow. Solution: Disable automatic updates and schedule maintenance windows, or use VM snapshots to roll back.
Screen resolution mismatches: Workflows developed on 1920x1080 fail on 1280x1024. Solution: Use relative positioning instead of absolute coordinates, or standardize VM screen resolution.
Antivirus interference: Desktop agents trigger heuristic detection because they simulate mouse and keyboard input. Solution: Whitelist the agent binary and add exclusions for workflow directories.
License exhaustion: Running out of Windows Server CALs or RDS licenses. Solution: Monitor license usage and implement session pooling with aggressive timeouts.
Network latency: Desktop apps that load data from remote servers can timeout. Solution: Increase wait times in workflow definitions or use network-aware validation.
Technical Verdict
Use Minicor when:
- You need to integrate AI agents with legacy Windows desktop apps that have no API
- You’re automating workflows that run thousands of times per day and need reliability
- You have budget for Windows Server licensing and VM infrastructure
- You need full observability and video replay for compliance or debugging
Avoid when:
- The target application has a usable API (even a bad one is better than desktop automation)
- You’re automating workflows that run infrequently (manual execution is cheaper)
- You can’t tolerate 5-10 second cold-start latency for session allocation
- Your workflows require sub-second response times (desktop automation adds 200-500ms per action)
The hard parts of desktop RPA are session management, state recovery, and observability. Minicor exposes these as managed primitives instead of making you build them. The trade-off is cost and vendor lock-in vs. engineering time saved.