New in Kiln: RAG Q&A Evals, Tool Use Evals, Reranking, Semantic Chunking, an MCP server, and more!

Contents

RAG Evals with Synthetic Q&A Data
Tool Use Evals
RAG Reranking
RAG Semantic Chunking
Kiln MCP Server
Kiln v0.23.0 App Release

Hi everyone 👋 - We just shipped some powerful features to Kiln as part of our v0.23.0 app release!

RAG Evals with Synthetic Q&A Data From Your Docs

Evaluating RAG or agents using RAG is tricky. An LLM-as-judge doesn't have the knowledge from your documents, so it can't evaluate if a response is correct. But giving the judge access to RAG biases the evaluation.

The solution: reference-answer evals. The judge compares results to a known correct answer. Building these datasets used to be a long manual process… until now.

Kiln now builds Q&A datasets for evals by iterating over your document store. The process is fully interactive and takes just a few minutes to generate hundreds of reference answers from your documents. Use it to evaluate RAG accuracy end-to-end, including whether your agent calls RAG at the right times with quality queries. Learn more in our docs.

Interactive Q&A dataset generation for RAG evaluations

Tool Use Evals: The Right Tool at the Right Time

It doesn't matter how well a tool works if it isn't invoked when needed. Kiln now includes a new eval for tool use that checks:

A tool is called when needed
A tool isn't called when it shouldn't be
The parameters passed to the tool are correct

Read the Docs

Visual Schema Builder: define complex tool schemas in an interactive UI
New Models: MiniMax M2, Kimi K2 Thinking, Qwen3 VL
And more: embedding models on OpenRouter, fixed Vertex fine-tuning bug, more deterministic OpenRouter routing, and dozens of other improvements

New in Kiln: RAG Q&A Evals, Tool Use Evals, Reranking, Semantic Chunking, an MCP server, and more!

Our new v0.23.0 app release includes powerful new features for RAG, evals, and more!

RAG Evals with Synthetic Q&A Data From Your Docs

Tool Use Evals: The Right Tool at the Right Time

RAG Reranking

RAG Semantic Chunking

Kiln MCP Server: Agents and RAG

Kiln v0.23.0 App Release