Author a workflow

A workflow is the list of stages a run walks. You pass it inline as definition when you submit a run — pond executes it from that snapshot.

Shape

{
  "stages": [
    { "key": "inventory", "name": "Inventory", "category": "scan",
      "executor": { "kind": "builtin", "name": "file_inventory" } },
    { "key": "analyze",   "name": "Analyze",   "executor": { "kind": "swarm", … } },
    { "key": "verify",    "name": "Verify",    "executor": { "kind": "shell", … } }
  ],
  "edges": [ { "from": "inventory", "to": "analyze" }, { "from": "analyze", "to": "verify" } ]
}
  • stages run in array order unless edges declare a DAG (then it’s topologically sorted; same-rank stages keep array order).
  • Each stage has a key (unique), a display name/category, and an executor. Before the first stage, pond runs an implicit fetch step that materializes your sources into a checkout.

Executors

noop

A placeholder / delay. { "kind": "noop", "ticks": 1, "interval_sec": 0 }.

builtin

In-process Python shipped with pond. { "kind": "builtin", "name": "file_inventory" }. Builtins (file inventory, TODO scan, …) need a checkout but no pool, and write neutral results.jsonl output.

shell

A subprocess on the pond host with cwd = the checkout, a restricted env, and a timeout. { "kind": "shell", "argv": ["semgrep", "--json", "."], "timeout_sec": 600 }. Good for static-analysis CLIs. (Real isolation for shell tools is a deployment overlay; for untrusted code prefer a swarm stage with a sandbox profile.)

swarm — the AI agent

Assembled from config primitives (no fixed modes):

{
  "kind": "swarm",
  "prompt": "Investigate {{AREA_TITLE}} ({{GLOBS}}). {{WHY}}",
  "fanout": { "source": "map", "artifact": "map.json", "maxAreas": 8 },
  "parse":  { "fence": "result", "output": "results.jsonl", "shape": "jsonl" },
  "writesArtifact": "map.json",
  "sandbox": { "profile": "untrusted-code-write", "tier": "trusted",
               "backend": "firecracker", "allow_hosts": ["git.acme.com"],
               "overrides": { "image": "your/agent:latest", "memory": "4g" } }
}
  • prompt — inline agent instructions. When fanning out, {{AREA_TITLE}} / {{GLOBS}} / {{WHY}} are substituted per area.
  • fanout.sourcenone (one agent over the whole repo), map (one agent per area read from a prior stage’s artifact), or list (static areas).
  • parse — a product-neutral config spec for turning the agent’s reply into a file: { fence, output, shape: "jsonl"|"json"|"edit", requireKeys, dedupeKey, writesArtifact }. The engine extracts ```<fence> JSON blocks and writes them to output. The neutral default (omit parse) is fence result results.jsonl. shape:"json" keeps a single object; shape:"edit" is for file-edit results. (Legacy strings — "chain", "map", "review", "edit", "result" — still normalize to a config, for back-compat.)
  • sandbox — the isolation contract; required for swarm stages (fail- closed). See Sandbox a run.

A swarm stage is dispatched to a worker pool. The other executors run in-process.

Artifacts & hand-off

A stage writes structured output to its output dir; top-level *.jsonl / *.json files (e.g. results.jsonl) are promoted to the run’s artifact set, and run-scope artifacts (e.g. map.json) are read by later stages. Pond treats these artifacts as opaque — a consumer hook (app/stage_hooks.py) may turn one into domain objects. Pull artifacts via GET /v1/runs/{id}/artifacts.

See also: Integrate · Sandbox a run.