News | AlchemyLab

May 13, 2026

AlchemyLab 1.4

Quests, light mode (beta), and faster testing.

Quests

A new mode for open-ended work where you don't know up front how many turns it'll take. Define a goal, success criteria, an iteration cap, and a minimum tick count; the agent keeps iterating until you mark it done or a cap is hit. The LLM can't end a quest on its own. Quests appear in the timeline with a status panel, tick countdown, and criteria checklist.

Light mode (beta)

A light theme covers the full app. Toggle in Settings → Theme. Still in beta — expect a few rough edges in dense views.

Faster testing

A pass of improvements to the test loop:

The warm worktree pool now skips setup (npm install and friends) when nothing relevant has changed, using a fingerprint of files you nominate (`package-lock.json`, `Cargo.lock`, `uv.lock`, `go.sum`, etc.).
A recent passing test run (within 10 minutes) now counts toward the merge gate in auto-test-then-merge mode, so finishing doesn't trigger a redundant re-run.
Coders can fire an async test run mid-session via a new `tests run` operation and keep working while it runs in the background.

Commands

A round of improvements to the commands system: specialists can register reusable project commands (lint, build, custom shell), and approvals for new commands now arrive as a single batched card with approve / decline / Select-a-subset instead of one prompt per command.

Sessions panel

Two new bits in the drawer:

A live token-usage + remaining-context meter on each session header, so you can spot a session walking up to its window before it stalls.
Hovering a specialist tab now peeks at the specialist's current task and most recent message without clicking in.

April 23, 2026

AlchemyLab 1.3

Herald voice mode, multi-project, and model-aware thinking.

Herald

Voice mode for the Sessions panel. Press the trumpet icon and talk instead of type. Herald cleans up your dictation before the agent receives it, and reads the agent's reply back to you as prose, so you can work with your hands off the keyboard.

The cleanup matters. Raw speech-to-text is full of homophones and false starts, and agents take it literally. Herald sits between you and the session, rewriting the transcript into something clean before forwarding it. Replies come back to Herald first, which summarises them before reading them out, so you get the gist of what the agent did instead of the raw tool-call soup.

If you prefer unfiltered dictation, turn Herald off in settings and the voice button goes back to plain speech-to-text.

View a short demo.

Model-aware thinking

Reasoning-depth options now come from provider metadata instead of a hardcoded list. Switch models and the available thinking levels follow what that model actually supports.

Multi-project

You can run more than one alchemy-local project at a time. Each instance registers itself at the machine level, and the UI binds its action-server URL to the selected project so things route to the right place. The project list shows which projects are currently connected, with a filter to match.

Bug fixes

Long sessions slowly got sluggish from leaked SSE timer callbacks.
Opening Settings could crash the tab when sessionStorage hit its quota.
Top-level sessions of unknown type showed up under alchemists in the drawer.
Planner choice cards briefly flashed the wrong card while the reply was still streaming.
Sent messages flashed a duplicate for a moment before settling.
Clicking a specialist while in alchemist view could leave the wrong view open.
Session spawning had races in parent lookup and auto-planner triggering.
Timer-hook actions silently swallowed errors instead of surfacing them.
Multiple browser tabs could race to create the same alchemist on reload.

April 15, 2026

AlchemyLab 1.2

Commands and timer hooks.

Commands

Project-level shell commands that agents run to check their work. You configure them per project. Agents can suggest new ones, but those need your approval before they're active.

Timer hooks

Hooks that fire on an interval instead of an event. Each timer has conditions — only when no specialists are busy, only after a session has been idle for N minutes. Auto-pause on inactivity. New header indicator with status and controls.

Bug fixes

Subagents sometimes ignored the user's model selection and ended up on the wrong provider.
Selected project was cleared on every login.
Request completion could use a stale session identity.
Auth validation logged noisy warnings on temporary network errors.
Adding or connecting a provider didn't take effect until restart.
Tool call rendering collapsed incorrectly for task-related operations.

April 10, 2026

AlchemyLab 1.1

Own runtime, smarter agents, two-phase request completion, and stuck session detection.

Two weeks of work on the internals. The biggest change is invisible: AlchemyLab no longer depends on an external runtime to run agents.

Own runtime

AlchemyLab 1.0 required Bun installed on your machine to run agent sessions. That's gone. We forked OpenCode and rebuilt the server layer to run on Node.js, which you almost certainly already have. The fork ships as a single npm package, about 285MB — down from 2.6GB.

The deeper reason for the fork wasn't size. Concurrent agent sessions worked before, but the workarounds were ugly. Owning the runtime let us do it cleanly.

The fork tracks upstream and we have a sync process to pull in updates without losing our patches.

Two-phase request completion

In 1.0, completing a request meant verifying criteria and merging the branch in one step. If the merge hit conflicts, the whole thing got messy.

Now completion is split. When a coder finishes, the request moves to a verification stage where acceptance criteria are checked against the actual evidence. Only after that does the merge happen. If something fails, you know whether it was a criteria problem or a merge problem, and the request sits in a clear state instead of being half-done.

Quick actions

Not everything needs a full request. Specialists now have lightweight tools — quick write, quick replace, quick manage — that wrap a single file operation in its own micro-request. The change is still tracked and reversible, but there's no overhead of task decomposition for a one-line fix.

Stuck session detection

Agent sessions sometimes hang. The model stops responding, or a tool call goes nowhere, and the session just sits there burning time. There's now a watchdog that detects this: if a session has no output and no active tool calls for too long, it surfaces in the UI so you can kill it or retry.

Specialist memory

Specialists can now update their own description of what they know. When a specialist works on authentication code for the third time, it writes that down. Next time auth work comes in, the alchemist routes it to the right specialist instead of picking one at random.

Model selection

You can now switch models per session from the agent drawer. The system resolves which model to use from a stack: the agent type's default, the user's preference, and what's actually available from connected providers. If you want your specialists on a cheaper model and your alchemist on the flagship, that works.

Provider configuration

You can now add providers directly from the settings panel. Browse a list of known providers or configure a custom endpoint. Disconnect and reconnect without restarting.

AlchemyLab is built with AlchemyLab. The 722 requests that went into this release were all run through the system.

March 27, 2026

Introducing AlchemyLab

An agentic coding IDE with agent hierarchy, full traceability, and rollback. In the browser, but your dev environment stays local.

You describe a change. Agents break it down, execute it across your codebase, and every step is recorded.

Agent hierarchy

Work flows through three layers:

Alchemist — the orchestrator. Reads your request, understands the codebase, delegates to specialists.

Specialists — hold deep context on specific parts of the codebase. Each specialist owns a set of tasks and decides how to implement them.
Coders — do the actual edits. Each coder works on an isolated git worktree, so parallel execution doesn't cause conflicts.

When coders finish, their branches merge back. The specialist reviews the result against the original task. If something's wrong, it goes back.

Beyond the core three, there are purpose-built agent types: research agents for codebase exploration, planners for architecture and design, history agents that trace changes back to their origin, testers that verify work against acceptance criteria, and a workflow architect that designs automation rules. Each type has a locked-down toolset. A research agent can read but not write, a tester can run commands but not edit files.

Traceability

Every request, task, tool call, and file change is a record with a unique ID. You can walk the full chain from a changed line, to the tool call that changed it, to the task that requested it, to the original request.

Git blame is integrated with the record system. Running blame on a file shows not just which commit changed each line, but which request and task caused it. This works because every agent commit is tagged with a record ID.

The record view is how you review what agents did, approve changes, and understand decisions.

Collaborative planning

You talk through a feature or architecture change with a planner agent, and it builds the plan document in real time, section by section, so you can steer as it takes shape. You can annotate specific sections with inline comments, and the planner picks those up and revises.

When the plan is ready, the planner decomposes it into linked requests and offers a handoff to build mode. Requests carry a reference back to the plan, so implementation traces back to the design decision. Plans are versionable if you need to snapshot or restore, but the core value is the back-and-forth: you and the AI iterating on a design before any code gets written.

Agents can also generate journals, narrative summaries of completed requests covering what was done, what decisions were made, and what files were touched.

Acceptance criteria and evidence

Requests have typed acceptance criteria. When a coder completes work, it writes what it did to satisfy each criterion and how it verified the result. Testers can independently validate criteria by running configured test commands and reporting results.

Rollback

Completed requests can be reverted with one action. Rollbacks follow LIFO ordering so dependencies unwind cleanly. You can't revert request #3 if request #4 built on top of it without reverting #4 first.

File changes, git state, and records are all unwound together.

Agent communication

Agents coordinate through typed messages (reports, questions, handoffs), each with a sender, recipient, and status. Messages are stored and queryable, so the full coordination chain is visible.

Hooks, recipes, and workflows

Automation is built in layers. Hooks are event-triggered prompts: when a specific event occurs (a task completes, a session goes idle, a request is created), a hook can fire an agent with instructions. Recipes group related hooks together. Workflows bundle recipes into a named configuration you can activate per alchemist.

A workflow architect agent can design these for you. Describe the behavior you want, and it builds the hooks, recipes, and workflow structure using purpose-built tools.

Tooling

Agents have no bash access.* This is a deliberate choice. Most AI coding tools give agents a shell and let them run wild. It works remarkably well. Until it doesn't, and something gets blown up beyond repair. The popular answer is to put everything in a container, which brings its own set of problems. We're making a different bet: that the future of AI tooling isn't bash commands from half a century ago. You can't reliably record what a bash command did, you can't undo it, and you can't constrain what the model decides to run. Some tools now use a second agent to check whether the first agent's commands are safe, but who checks the checker?

AlchemyLab replaces this with purpose-built tools: structured file edits, search, git blame wired into the request system, file recovery. Every tool call records exactly what it changed, so the audit trail is complete and rollback can actually unwind it.

The trade-off is that agents can't do arbitrary things. That's the point.

* Bash can be enabled per agent when the task calls for it. But it's off by default, and the built-in tools cover the vast majority of development work.

Local-first

AlchemyLab runs on your machine, built on OpenCode. Files stay on disk, all changes go through actual git commits. Metadata (requests, tasks, records, plans) lives on a server that enables the UI and multi-device access. Provider configuration uses the OpenCode system, so anything working in OpenCode works in AlchemyLab. Model selection is configurable per agent type.

The UI runs in the browser and is accessible from any device: phone, tablet, another computer. The browser is the IDE.

Current status

AlchemyLab is early and actively developed. The core loop — requests, tasks, agents, records, rollback — works. The hook and workflow system is functional but still being refined. Testing integration is in progress. AlchemyLab is built with AlchemyLab.

If you've gotten this far you might as well try it.