BLOG

Kiln Eval Builder & AI Assistant for Evals and Synthetic Data

Introducing Kiln Eval Builder — a guided assistant that helps you build accurate evals, synthetic data, and judge prompts in minutes instead of hours.

Introducing Kiln Eval Builder

Building good evals is tedious. Writing judge prompts, creating synthetic data, aligning to human preference... it can take 30+ minutes per eval, and still not be accurate.

Our new Kiln Eval Builder feature changes that — a guided assistant that walks you through the entire process in as little as 5 minutes.

  • Aligned to your judgement: Our alignment loop finds edge cases, compares LLM vs. human preference, and iterates until your judge matches how you would evaluate
  • Automatic synthetic data: Build robust datasets for evals and training as you go — ready when you save
  • No prompt engineering required: Describe your concerns in plain English; Kiln Eval Builder turns them into accurate, judge-able evals
  • Guided start-to-finish: Subject matter experts can build rigorous evals without managing each step or looping in data scientists
  • 5x faster: Create a complete eval in ~5 minutes vs. 30+ minutes for traditional approaches
Kiln Eval Builder Walkthrough

Bonus Feature: Fine-tuning for Tool Calling

Our latest release also includes the ability to fine-tune a model for calling a specific set of tools. The fine-tuned models can improve over the base model by:

  • Learn when to call each tool (and when not to)
  • Choose the right tool for each situation
  • Format tool calls correctly — reducing errors and making smaller, cheaper models viable

Together, this means you can improve agent performance and lower costs.

Read the docs →

Want Our Help Building, Evaluating and Optimizing Your AI system?

If you're part of an enterprise and are looking for help building better AI systems, please reach out! We're available to help companies improve their AI systems, collaboration workflows, build evals, and generate synthetic datasets.

Jump to section
Newsletter

New posts in your inbox.

Build AI that actually works.

Ship custom AI products with evals, fine-tuning, and prompt optimization built in.

macOS, Windows, and Linux