Agents & Subagents.

Agents that delegate, not just chat

Design multi-agent hierarchies, keep context focused, and evaluate every level with built-in evals.

macOS · Windows · Linux

From idea to agent in 4 steps

01
Define tasks

Create a task for each role in your system—orchestrator, researcher, analyst—with a prompt and schema.

02
Add tools

Connect MCP tools, Kiln search tools, or other Kiln tasks as callable tools from the project Tools screen.

03
Compose agents

Pick tools and subagents per run, then save the config for repeat use.

04
Run & trace

Execute the agent, inspect every tool call and subagent invocation, then iterate on prompts and tool selection.

Agents in Kiln

Subagents: Levels of Autonomy

An agent is a Kiln task that loops autonomously — reasoning, calling tools, deciding when it's done. A subagent is any Kiln task turned into a tool. The parent delegates a focused job, the subagent runs in its own context. The same subagent can be reused across many parent workflows.

Focused Context Windows

Long-running agents accumulate context fast — web pages, API responses, intermediate reasoning. Quality degrades, costs spike. Kiln subagents fix this structurally: each runs in its own context window.

Evaluate Each Level

Every agent and subagent can be evaluated independently. Tool-use evals check tool calls and parameters. Evals measure output quality. Run configurations lock model, prompt, and tools per subagent — swap one variable, measure the impact across the whole system.

Everything you need for production agents

Kiln Tasks as Tools

Turn any task into a callable subagent for multi-actor orchestration.

MCP tool support

Connect local or remote MCP servers—APIs, databases, web search.

Autonomous looping

Agents loop through reasoning and tool calls until the job is done.

Run configurations

Lock model, prompt, and tools per subagent for reproducible runs.

ReAct pattern

Interleave reasoning and tool calls.

Full trace visibility

Inspect every tool call, subagent invocation, and message in one view.

Tool-use evals

Evaluate whether agents call the right tools with the right parameters.

Open source & local-first

MIT-licensed Python library, source-available app. Your project files stay on your machine.

Multi-agent systems before and after Kiln

Without Kiln
  • Wire together LangChain or custom glue code to orchestrate multiple agents—then maintain it.
  • Watch context windows bloat with irrelevant tool output until quality degrades and costs spike.
  • Evaluate the final answer and hope the intermediate steps are working correctly.
With Kiln
  • Compose orchestrators and specialists by turning any Kiln task into a callable subagent.
  • Subagent isolation manages context automatically—irrelevant data is dropped when a subtask ends.
  • Evaluate every level of the hierarchy independently with tool-use evals and spec-based scoring.

Frequently asked

What is the difference between an agent and a subagent in Kiln?

An agent is a Kiln task that loops autonomously — reasoning and calling tools until it's done. A subagent is any Kiln task turned into a tool another task can call. The same task can be both: agent in one workflow, subagent in another.

How does context management work with subagents?

Each subagent runs in its own context window. When it completes, the full message history is discarded — only the final message returns to the parent. The parent's context stays small and focused, avoiding overflow and runaway costs.

Do I need to write code to build multi-agent systems?

No. Tasks, tools, and agent hierarchies are all built in the Kiln desktop app and saved to agent configuration files. The Python library runs the same configurations in production.

Can I connect agents I already built in LangChain, CrewAI, or custom code?

Yes. Wrap it as an MCP server (SDKs exist for Python, TypeScript, Go, Rust, and more) and Kiln can call it, eval it, and compare against Kiln-native agents — no rewrite required.

What tools can my agents use?

Any MCP-compatible tool server (local or remote), Kiln Search Tools for RAG, and other Kiln tasks as subagents. Connect any custom MCP server through the UI.

How do I evaluate agent quality?

Tool-use evals verify the right tools were called with the right parameters. LLM-as-Judge evals measure output quality. Because every subagent is a standalone Kiln task, you can evaluate any level of the hierarchy.

Ship agents you can trace and evaluate.

Compose multi-agent hierarchies, manage context automatically, and evaluate every level.