Feb 4, 2026

New AI Copilot for Evals & Synthetic Data

Introducing Kiln Specs — a guided copilot that helps you build accurate evals, synthetic data, and judge prompts in minutes instead of hours.

Introducing Kiln Specs with Copilot

Building good evals is tedious. Writing judge prompts, creating synthetic data, aligning to human preference... it can take 30+ minutes per eval, and still not be accurate.

Our new Kiln Specs feature changes that — a copilot that walks you through the entire process in as little as 5 minutes.

  • Aligned to your judgement: Our alignment loop finds edge cases, compares LLM vs. human preference, and iterates until your judge matches how you would evaluate
  • Automatic synthetic data: Build robust datasets for evals and training as you go — ready when you save
  • No prompt engineering required: Describe your concerns in plain English; the copilot turns them into accurate, judge-able evals
  • Guided start-to-finish: Subject matter experts can build rigorous evals without managing each step or looping in data scientists
  • 5x faster: Create a complete Spec in ~5 minutes vs. 30+ minutes for traditional evals

Try Kiln Specs Now →

Kiln Specs Walkthrough

Bonus Feature: Fine-tuning for Tool Calling

Our latest release also includes the ability to fine-tune a model for calling a specific set of tools. The fine-tuned models can improve over the base model by:

  • Learn when to call each tool (and when not to)
  • Choose the right tool for each situation
  • Format tool calls correctly — reducing errors and making smaller, cheaper models viable

Together, this means you can improve agent performance and lower costs.

Read the docs →

Want Our Help Building, Evaluating and Optimizing Your AI system?

If you're part of an enterprise and are looking for help building better AI systems, please reach out! We're available to help companies improve their AI systems, collaboration workflows, build evals, and generate synthetic datasets.

Get Kiln Updates in Your Inbox
Zero spam, unsubscribe at any time.