mech.app
Automation

Task Assets: Persistent Agent Workflows with Scheduling and State

How akm 0.8.0 task assets handle cron scheduling, environment injection, failure recovery, and observability for long-running agent workflows.

Source: dev.to
Task Assets: Persistent Agent Workflows with Scheduling and State

Most agent workflows are synchronous. You open a session, issue a prompt, wait for tool calls to resolve, then close the session. The agent’s clock runs when you run it.

Task assets in akm 0.8.0 flip that model. A task is a YAML file that defines what to run, when to run it, what environment it needs, and how long it’s allowed to take. Once registered, the OS scheduler calls akm tasks run <id> on a cron schedule. The agent executes, writes results to state.db, and exits. You check akm health or read logs to see what happened.

This is the infrastructure layer that makes continuous agent operation possible. The improve loop runs twice an hour because a task asset says it does. The Discord health report fires hourly because a task asset says it does. Neither requires an open terminal.

YAML Schema and Scheduling

Task assets live at <stash>/tasks/<id>.yml. The filename is the task ID. A minimal task looks like this:

schedule: "0 * * * *"
command: "akm improve --auto-accept 90"
enabled: true

The schedule field is a cron expression. The command is the shell invocation. The enabled flag lets you pause a task without deleting it.

Optional fields include:

  • timeout: Maximum execution time in seconds (default 300).
  • env: Key-value pairs injected into the task’s environment.
  • description: Human-readable summary for logs and health reports.

The scheduler is the OS cron daemon. When you run akm tasks install, akm writes a crontab entry that calls akm tasks run <id> for each enabled task. This keeps the orchestration layer thin. No custom daemon, no polling loop, no process supervisor. The OS already knows how to run things on a schedule.

Manual Invocation and Testing

You can invoke any task manually with akm tasks run <id>. This bypasses the schedule and runs the task immediately. Useful for:

  • Testing a new task before enabling it.
  • Re-running a failed task without waiting for the next scheduled interval.
  • Triggering a task in response to an external event (webhook, CI pipeline, manual intervention).

Manual invocation uses the same environment injection, timeout enforcement, and logging as scheduled runs. The only difference is the trigger.

Environment Injection and Secrets

The env block in the task YAML injects environment variables at runtime:

env:
  DISCORD_WEBHOOK_URL: "${DISCORD_WEBHOOK_URL}"
  OPENAI_API_KEY: "${OPENAI_API_KEY}"

The ${VAR} syntax pulls from the parent shell’s environment when the task runs. This keeps secrets out of the YAML file. The task definition is checked into version control. The secrets live in .env or the system environment.

At runtime, akm merges the task’s env block with the existing environment and passes the result to the command subprocess. If a variable is missing, the task fails with a clear error before invoking the agent.

State Management and Failure Recovery

Each task execution writes a record to state.db:

  • Task ID
  • Start timestamp
  • End timestamp
  • Exit code
  • Stdout/stderr capture
  • Token usage (if the command invokes an LLM)

If a task fails (non-zero exit code), the next scheduled run proceeds anyway. There is no automatic retry logic. The task either succeeds or fails, and the result is logged. You decide whether to fix the task definition, adjust the schedule, or ignore the failure.

This design choice avoids retry storms. If a task fails because an API is down, retrying every minute makes the problem worse. If it fails because the YAML is broken, retrying won’t help. The human checks the log, fixes the root cause, and re-runs manually if needed.

For tasks that need idempotency (like the improve loop), the command itself handles state. The akm improve command checks state.db to see what was already processed and skips duplicates. The task asset just schedules the invocation.

Logging and Observability

Task output goes to two places:

  1. Stdout/stderr capture: Stored in state.db alongside the execution record.
  2. Structured logs: Written to <stash>/logs/tasks/<id>/<timestamp>.log.

The structured log includes:

  • Task ID and description
  • Start/end timestamps
  • Exit code
  • Environment variables (keys only, not values)
  • Token usage breakdown (prompt tokens, completion tokens, total cost)
  • Tool invocation counts (if the command uses an agent)

The akm health command aggregates task execution history and surfaces:

  • Tasks that failed in the last 24 hours
  • Tasks that exceeded their timeout
  • Tasks that consumed more tokens than expected
  • Tasks that haven’t run recently (schedule drift)

This is the observability hook for scheduled workflows. You don’t watch the agent run. You check the health report and investigate anomalies.

Discord Health Report Example

The worked example in the post is a Discord health report that runs hourly:

schedule: "0 * * * *"
command: "akm health --format discord | curl -X POST -H 'Content-Type: application/json' -d @- ${DISCORD_WEBHOOK_URL}"
enabled: true
env:
  DISCORD_WEBHOOK_URL: "${DISCORD_WEBHOOK_URL}"
timeout: 60
description: "Hourly health report to Discord"

The akm health --format discord command generates a JSON payload with task status, token usage, and error summaries. The curl invocation posts it to a Discord webhook. The entire workflow is declarative. No custom code, no agent involvement, just shell composition.

If the webhook is down, the task fails. The next run tries again. If the health command crashes, the error is logged. The task asset doesn’t care what the command does. It just schedules it, enforces the timeout, and records the result.

Trade-Offs and Failure Modes

AspectTask AssetsAlternative (e.g., GitHub Actions, Trigger.dev)
SchedulingOS cron, local machineCloud scheduler, event-driven
StateSQLite state.db, local filesystemManaged database, cloud storage
SecretsEnvironment variables, .envEncrypted secrets manager
ObservabilityLogs + akm healthDashboards, APM, structured telemetry
Failure recoveryManual re-run, no automatic retryConfigurable retry policies
CostFree (local compute)Metered (execution time, invocations)

Task assets are optimized for local, self-hosted workflows. If your agent runs on your laptop or a dedicated server, this is the simplest path. If you need distributed execution, multi-region failover, or centralized observability, you’ll need a managed orchestration layer.

Common failure modes:

  • Clock drift: If the machine sleeps or reboots, cron entries may fire late or not at all. Use anacron or a cloud scheduler for critical tasks.
  • Environment mismatch: If the cron environment differs from your shell environment, tasks may fail due to missing PATH or secrets. Test with akm tasks run before enabling.
  • Timeout tuning: If a task occasionally exceeds its timeout, you’ll see partial results in the log. Increase the timeout or optimize the command.
  • Log bloat: Long-running tasks with verbose output can fill the log directory. Rotate logs or redirect output to /dev/null for low-value tasks.

Technical Verdict

Use task assets when you need scheduled, persistent agent workflows on a single machine or small cluster. The YAML schema is simple, the OS scheduler is reliable, and the observability hooks are sufficient for debugging. This is the right layer for continuous improvement loops, periodic health checks, and background data processing.

Avoid task assets if you need distributed execution, complex retry logic, or centralized monitoring. The local-first design doesn’t scale to multi-region deployments or high-availability requirements. For those cases, use a managed orchestration platform (Temporal, Prefect, Trigger.dev) and treat akm as a task executor, not a scheduler.

The sweet spot is workflows that run on your infrastructure, use your secrets, and report to your logs. Task assets make those workflows declarative, observable, and maintainable without adding a new runtime dependency.

Tags

agentic-ai orchestration infrastructure

Primary Source

dev.to