Skip to main content

Documentation Index

Fetch the complete documentation index at: https://panopticon-cli.com/llms.txt

Use this file to discover all available pages before exploring further.

Specialist Agents

Work agents write code. Specialists turn that code into something you can actually merge. They review, test, click through the UI, resolve conflicts, and hand off to each other automatically — you just click Merge at the end. Panopticon kanban board with specialists running

Overview

A specialist is a focused agent with one job. It takes a completed piece of work, does that job, reports passed or failed, and either advances the work to the next stage or bounces it back to the work-agent with feedback. Specialists are per-project and ephemeral — they spawn on-demand against a workspace, do their job, and terminate. There is no global specialist pool to warm up, no long-lived tmux sessions to babysit, and nothing to initialize before you start working. What makes them different from work agents:
  • Narrow scope. Each specialist has one responsibility and a purpose-built prompt.
  • Per-project. Spawned against a specific project’s workspace with the right tools and context.
  • Queued. If one specialist is busy, new tasks queue up and drain automatically.
  • Coordinated. Cloister handles handoffs between stages.

The Five Specialists

SpecialistPurposeTrigger
review-agentCode review before mergeHuman clicks Review (dashboard)
test-agentRuns the full test suiteAuto after review passes
inspect-agentPer-status-change verificationAny specialist reports passed
uat-agentBrowser-based acceptance testing via PlaywrightAuto after tests pass
merge-agentAll merges + conflict resolutionHuman clicks Approve & Merge
Specialists dashboard showing Cloister Deacon and per-project specialists

Review Pipeline Flow

The happy path is a sequential handoff. A human kicks it off; the rest is automatic until the final merge click.
Human clicks "Review"


┌───────────────────┐
│   review-agent    │  logic, security, perf, style
└─────────┬─────────┘
          │ passed → queue test-agent
          │ failed → feedback to work-agent

┌───────────────────┐
│    test-agent     │  runs full test suite
└─────────┬─────────┘
          │ passed → queue uat-agent
          │ failed → feedback to work-agent

┌───────────────────┐
│    uat-agent      │  browser walks through ACs
└─────────┬─────────┘
          │ passed → ready-to-merge
          │ failed → feedback with screenshots

┌───────────────────┐
│   Human clicks    │  <— only human gate after Review
│ "Approve & Merge" │
└─────────┬─────────┘

┌───────────────────┐
│   merge-agent     │  merges, resolves conflicts, reruns tests
└───────────────────┘
inspect-agent runs alongside this pipeline rather than inside it — every time any specialist reports passed, inspect verifies the state transition against the spec before the next stage runs. Model + specialist handoffs tracked over time

Review Agent

The review-agent is the gatekeeper. It reads the diff, the PRD, and the vBRIEF plan, and looks for:
  • Logic errors and missed edge cases
  • Security vulnerabilities (OWASP top 10, injection, auth bypass)
  • Performance issues (N+1 queries, unnecessary work, leaked resources)
  • Code quality and adherence to project conventions
Triggering review-agent:
# Via dashboard: click "Review" on an issue card
# Via CLI:
pan specialists wake review-agent --task "review PAN-655"
Review outcomes:
  • passed — queues test-agent automatically
  • failed — sends structured feedback to the work-agent, blocks the pipeline
  • skipped — not applicable (e.g., docs-only change)
Review can also run in convoy mode — multiple reviewers in parallel, each with a focused lens (security, performance, requirements). See the code-review convoy.

Test Agent

The test-agent runs every configured test suite for the project and analyzes the failures rather than just reporting pass/fail. What it does:
  1. Runs all configured suites (backend, frontend unit, integration, e2e)
  2. Diagnoses failures — flake vs. real regression vs. environmental
  3. Reports results with actionable, file:line-referenced feedback
  4. On pass, queues uat-agent (if enabled) or marks the work ready-to-merge
Test outcomes:
  • passed — advances to UAT or ready-to-merge
  • failed — feedback with failing test names and excerpts goes back to the work-agent
  • skipped — no test suite applies to this change
  • dispatch_failed — Cloister couldn’t launch the test run (infra issue, not a code failure)
Project configuration in ~/.panopticon/projects.yaml:
projects:
  myproject:
    name: "My Project"
    path: /home/user/projects/myproject
    tests:
      backend:
        command: "npm test"
        timeout: 300000
      frontend_unit:
        command: "cd frontend && npm run test:unit"
        timeout: 180000
      e2e:
        command: "npm run test:e2e"
        timeout: 600000

Inspect Agent

The inspect-agent performs per-status-change verification. Whenever any specialist reports a status transition (review passed, test passed, etc.), inspect re-reads the spec and checks that the new state actually matches what the PRD asked for. Why it matters:
  • Catches spec drift before it compounds across multiple stages
  • Prevents a review-agent that’s too generous from sliding a broken change through
  • Keeps the pipeline honest about the difference between “it compiled” and “it works”
Inspect is wired into /api/specialists/done — it fires on status change, not on a fixed schedule. Enable it per-project:
projects:
  myproject:
    specialists:
      inspect_agent:
        enabled: true

UAT Agent

The UAT (User Acceptance Testing) agent opens a real browser via Playwright MCP and walks through the acceptance criteria like a user would. What it does:
  1. Reads the PRD and the vBRIEF acceptance criteria
  2. Launches a browser against the project’s dev URL
  3. Executes each AC as a user flow
  4. Takes screenshots as evidence (pass and fail)
  5. On pass, marks ready-to-merge; on fail, sends screenshots + DOM snapshots back
UAT outcomes:
  • passed — all ACs verified, ready for human merge approval
  • failed — feedback includes the failing step, a screenshot, and the observed DOM
  • skipped — backend-only or otherwise UI-irrelevant change
projects:
  myproject:
    specialists:
      uat_agent:
        enabled: true
        dev_url: "https://myapp.localhost"

Merge Agent

The merge-agent handles every merge, not just conflicted ones. This is deliberate:
  • It sees every diff that flows through the pipeline, building context
  • When conflicts do occur, it already understands the codebase
  • Tests are always re-run post-merge, catching integration regressions
Inspector panel with merge-agent terminal streaming live Workflow:
  1. Pull latest main
  2. Analyze the incoming diff
  3. Merge the feature branch
  4. Resolve conflicts with AI, documenting each decision
  5. Re-run tests post-merge
  6. Commit the merge with a descriptive message
  7. Report results back to the dashboard
Triggering merge-agent:
# Via dashboard: click "Approve & Merge" on an issue card
# Via CLI:
pan specialists wake merge-agent --task "merge PAN-655"
The merge-agent prompt forbids force-push, requires test re-runs, and mandates that conflict resolution decisions are documented in the merge commit message.

Queue Processing

Each specialist has a per-project task queue at ~/.panopticon/agents/{name}/hook.json, managed via the FPP (Fixed Point Principle) — borrowed from Gastown, inspired by Doctor Who: any runnable action is a fixed point and must resolve before the system can rest.
1. Task arrives (via API or handoff)


2. wakeSpecialistOrQueue() checks if specialist is busy

        ├── IDLE: wake specialist immediately with task

        └── BUSY: push task onto hook.json


3. On specialist completion (/api/specialists/done):

        ├── Cloister marks specialist idle
        ├── Drains the next queued task
        └── Wakes the specialist with it
Queue priority order: urgent > high > normal > low. The FPP watchdog notices when a specialist has pending hook work but is idle, and sends escalating nudges until the work resolves.

Agent Self-Requeue (Circuit Breaker)

After a human kicks off the first review, a work-agent can request re-review automatically when it thinks it has fixed the feedback:
pan work request-review MIN-123 -m "Fixed: added tests for edge cases"
The circuit breaker prevents infinite “fail → refix → re-review” loops:
  • First human Review click resets the counter to 0
  • Each pan work request-review increments it
  • After 7 automatic re-requests, the endpoint returns HTTP 429
  • A human must click Review in the dashboard to unstick it
API endpoint: POST /api/workspaces/:issueId/request-review The constant lives at src/dashboard/server/routes/workspaces.ts:81 (MAX_AUTO_REQUEUE = 7).

Specialist Safeguards

Specialists are constrained to prevent them from corrupting the main project repo:
  1. Spawned at project root — the workspace directory is passed as task context, never by cd-ing into it blindly
  2. Never checkout branches — they work with whatever branch the workspace already has
  3. Workspace-first operationspan workspace create <ISSUE-ID> is the only way to create new work
These are enforced in three layers:
  • Prompt — clear, repeated warnings in every specialist prompt template
  • Code — wake paths validate the target before spawning
  • Git hooksscripts/git-hooks/post-checkout auto-reverts any checkout detected inside a specialist tmux session
See Workspace Commands for branch protection details.

Configuration

Specialist configuration lives in ~/.panopticon/cloister.toml:
[specialists.review_agent]
enabled = true
auto_wake = false

[specialists.test_agent]
enabled = true
auto_wake = true

[specialists.inspect_agent]
enabled = true

[specialists.uat_agent]
enabled = true

[specialists.merge_agent]
enabled = true
auto_wake = false

[model_selection.specialist_models]
review_agent   = "sonnet"
test_agent     = "haiku"
merge_agent    = "sonnet"
planning_agent = "opus"
Options:
  • enabled — whether the specialist runs at all
  • auto_wake — auto-wake on trigger vs. wait for an explicit wake signal
  • [model_selection.specialist_models] — per-specialist model override (haiku / sonnet / opus)

When to Disable Each Specialist

Specialists are powerful but not free — each one adds latency and API cost. Sensible tradeoffs:
  • review-agent — almost never disable. The one gate between work-agents and prod.
  • test-agent — disable only if your test suite is broken or prohibitively slow. Fix the suite instead.
  • inspect-agent — disable for projects without a PRD/spec culture; it has nothing to verify against.
  • uat-agent — disable for backend-only services or CLIs with no browser surface.
  • merge-agent — keep enabled even for conflict-free projects; it’s your integration test safety net.

Viewing Specialist Status

# All specialists with their current state
pan specialists list

# JSON output for scripting
pan specialists list --json

# What's sitting in a specialist's queue
pan specialists queue review-agent

# Reset a wedged specialist (clears session, starts fresh)
pan specialists reset review-agent

# Clear a stuck queue without touching the session
pan specialists clear-queue review-agent

# Tail an active run in real-time
pan specialists logs myproject review-agent --tail
Project health panel with per-project specialist roll-up You can also watch handoffs and costs flow through the dashboard as they happen: Cost tracking across models and specialists
  • Cloister — the lifecycle manager that coordinates specialists
  • Convoys — parallel specialist execution for code review
  • Agent Commands — full CLI reference for working with agents