Kiln Eval Builder & AI Assistant for Evals and Synthetic Data

Introducing Kiln Eval Builder

Building good evals is tedious. Writing judge prompts, creating synthetic data, aligning to human preference... it can take 30+ minutes per eval, and still not be accurate.

Our new Kiln Eval Builder feature changes that — a guided assistant that walks you through the entire process in as little as 5 minutes.

Aligned to your judgement: Our alignment loop finds edge cases, compares LLM vs. human preference, and iterates until your judge matches how you would evaluate
Automatic synthetic data: Build robust datasets for evals and training as you go — ready when you save
No prompt engineering required: Describe your concerns in plain English; Kiln Eval Builder turns them into accurate, judge-able evals
Guided start-to-finish: Subject matter experts can build rigorous evals without managing each step or looping in data scientists
5x faster: Create a complete eval in ~5 minutes vs. 30+ minutes for traditional approaches

Try Kiln Eval Builder Now →

Kiln Eval Builder Walkthrough

Bonus Feature: Fine-tuning for Tool Calling

Our latest release also includes the ability to fine-tune a model for calling a specific set of tools. The fine-tuned models can improve over the base model by:

Learn when to call each tool (and when not to)
Choose the right tool for each situation
Format tool calls correctly — reducing errors and making smaller, cheaper models viable

Together, this means you can improve agent performance and lower costs.

Read the docs →

Want Our Help Building, Evaluating and Optimizing Your AI system?

If you're part of an enterprise and are looking for help building better AI systems, please reach out! We're available to help companies improve their AI systems, collaboration workflows, build evals, and generate synthetic datasets.

Download Kiln GitHub · 4.5k

Kiln Eval Builder & AI Assistant for Evals and Synthetic Data

Introducing Kiln Eval Builder

Bonus Feature: Fine-tuning for Tool Calling

Want Our Help Building, Evaluating and Optimizing Your AI system?

New posts in your inbox.

Build AI that actually works.