AlchemyLab 1.2

Commands and timer hooks.

Commands

Project-level shell commands that agents run to check their work. You configure them per project. Agents can suggest new ones, but those need your approval before they're active.

Timer hooks

Hooks that fire on an interval instead of an event. Each timer has conditions — only when no specialists are busy, only after a session has been idle for N minutes. Auto-pause on inactivity. New header indicator with status and controls.

Bug fixes

  • Subagents sometimes ignored the user's model selection and ended up on the wrong provider.
  • Selected project was cleared on every login.
  • Request completion could use a stale session identity.
  • Auth validation logged noisy warnings on temporary network errors.
  • Adding or connecting a provider didn't take effect until restart.
  • Tool call rendering collapsed incorrectly for task-related operations.

AlchemyLab 1.1

Own runtime, smarter agents, two-phase request completion, and stuck session detection.

Two weeks of work on the internals. The biggest change is invisible: AlchemyLab no longer depends on an external runtime to run agents.

Own runtime

AlchemyLab 1.0 required Bun installed on your machine to run agent sessions. That's gone. We forked OpenCode and rebuilt the server layer to run on Node.js, which you almost certainly already have. The fork ships as a single npm package, about 285MB — down from 2.6GB.

The deeper reason for the fork wasn't size. Concurrent agent sessions worked before, but the workarounds were ugly. Owning the runtime let us do it cleanly.

The fork tracks upstream and we have a sync process to pull in updates without losing our patches.

Two-phase request completion

In 1.0, completing a request meant verifying criteria and merging the branch in one step. If the merge hit conflicts, the whole thing got messy.

Now completion is split. When a coder finishes, the request moves to a verification stage where acceptance criteria are checked against the actual evidence. Only after that does the merge happen. If something fails, you know whether it was a criteria problem or a merge problem, and the request sits in a clear state instead of being half-done.

Quick actions

Not everything needs a full request. Specialists now have lightweight tools — quick write, quick replace, quick manage — that wrap a single file operation in its own micro-request. The change is still tracked and reversible, but there's no overhead of task decomposition for a one-line fix.

Stuck session detection

Agent sessions sometimes hang. The model stops responding, or a tool call goes nowhere, and the session just sits there burning time. There's now a watchdog that detects this: if a session has no output and no active tool calls for too long, it surfaces in the UI so you can kill it or retry.

Specialist memory

Specialists can now update their own description of what they know. When a specialist works on authentication code for the third time, it writes that down. Next time auth work comes in, the alchemist routes it to the right specialist instead of picking one at random.

Model selection

You can now switch models per session from the agent drawer. The system resolves which model to use from a stack: the agent type's default, the user's preference, and what's actually available from connected providers. If you want your specialists on a cheaper model and your alchemist on the flagship, that works.

Provider configuration

You can now add providers directly from the settings panel. Browse a list of known providers or configure a custom endpoint. Disconnect and reconnect without restarting.

AlchemyLab is built with AlchemyLab. The 722 requests that went into this release were all run through the system.

Introducing AlchemyLab

An agentic coding IDE with agent hierarchy, full traceability, and rollback. In the browser, but your dev environment stays local.

You describe a change. Agents break it down, execute it across your codebase, and every step is recorded.

Agent hierarchy

Work flows through three layers:

  • Alchemist — the orchestrator. Reads your request, understands the codebase, delegates to specialists.
  • Specialists — hold deep context on specific parts of the codebase. Each specialist owns a set of tasks and decides how to implement them.
  • Coders — do the actual edits. Each coder works on an isolated git worktree, so parallel execution doesn't cause conflicts.

When coders finish, their branches merge back. The specialist reviews the result against the original task. If something's wrong, it goes back.

Beyond the core three, there are purpose-built agent types: research agents for codebase exploration, planners for architecture and design, history agents that trace changes back to their origin, testers that verify work against acceptance criteria, and a workflow architect that designs automation rules. Each type has a locked-down toolset. A research agent can read but not write, a tester can run commands but not edit files.

Traceability

Every request, task, tool call, and file change is a record with a unique ID. You can walk the full chain from a changed line, to the tool call that changed it, to the task that requested it, to the original request.

Git blame is integrated with the record system. Running blame on a file shows not just which commit changed each line, but which request and task caused it. This works because every agent commit is tagged with a record ID.

The record view is how you review what agents did, approve changes, and understand decisions.

Collaborative planning

You talk through a feature or architecture change with a planner agent, and it builds the plan document in real time, section by section, so you can steer as it takes shape. You can annotate specific sections with inline comments, and the planner picks those up and revises.

When the plan is ready, the planner decomposes it into linked requests and offers a handoff to build mode. Requests carry a reference back to the plan, so implementation traces back to the design decision. Plans are versionable if you need to snapshot or restore, but the core value is the back-and-forth: you and the AI iterating on a design before any code gets written.

Agents can also generate journals, narrative summaries of completed requests covering what was done, what decisions were made, and what files were touched.

Acceptance criteria and evidence

Requests have typed acceptance criteria. When a coder completes work, it writes what it did to satisfy each criterion and how it verified the result. Testers can independently validate criteria by running configured test commands and reporting results.

Rollback

Completed requests can be reverted with one action. Rollbacks follow LIFO ordering so dependencies unwind cleanly. You can't revert request #3 if request #4 built on top of it without reverting #4 first.

File changes, git state, and records are all unwound together.

Agent communication

Agents coordinate through typed messages (reports, questions, handoffs), each with a sender, recipient, and status. Messages are stored and queryable, so the full coordination chain is visible.

Hooks, recipes, and workflows

Automation is built in layers. Hooks are event-triggered prompts: when a specific event occurs (a task completes, a session goes idle, a request is created), a hook can fire an agent with instructions. Recipes group related hooks together. Workflows bundle recipes into a named configuration you can activate per alchemist.

A workflow architect agent can design these for you. Describe the behavior you want, and it builds the hooks, recipes, and workflow structure using purpose-built tools.

Tooling

Agents have no bash access.* This is a deliberate choice. Most AI coding tools give agents a shell and let them run wild. It works remarkably well. Until it doesn't, and something gets blown up beyond repair. The popular answer is to put everything in a container, which brings its own set of problems. We're making a different bet: that the future of AI tooling isn't bash commands from half a century ago. You can't reliably record what a bash command did, you can't undo it, and you can't constrain what the model decides to run. Some tools now use a second agent to check whether the first agent's commands are safe, but who checks the checker?

AlchemyLab replaces this with purpose-built tools: structured file edits, search, git blame wired into the request system, file recovery. Every tool call records exactly what it changed, so the audit trail is complete and rollback can actually unwind it.

The trade-off is that agents can't do arbitrary things. That's the point.

* Bash can be enabled per agent when the task calls for it. But it's off by default, and the built-in tools cover the vast majority of development work.

Local-first

AlchemyLab runs on your machine, built on OpenCode. Files stay on disk, all changes go through actual git commits. Metadata (requests, tasks, records, plans) lives on a server that enables the UI and multi-device access. Provider configuration uses the OpenCode system, so anything working in OpenCode works in AlchemyLab. Model selection is configurable per agent type.

The UI runs in the browser and is accessible from any device: phone, tablet, another computer. The browser is the IDE.

Current status

AlchemyLab is early and actively developed. The core loop — requests, tasks, agents, records, rollback — works. The hook and workflow system is functional but still being refined. Testing integration is in progress. AlchemyLab is built with AlchemyLab.

If you've gotten this far you might as well try it.

Set up in under a minute

Create an account, install the CLI, and you're coding with agents.

1

Create your account

2

Install & run

Two commands. That's it.

$ npm install -g alchemylab
$ alchemy