Introducing Kiln Specs with Copilot
Building good evals is tedious. Writing judge prompts, creating synthetic data, aligning to human preference... it can take 30+ minutes per eval, and still not be accurate.
Our new Kiln Specs feature changes that — a copilot that walks you through the entire process in as little as 5 minutes.
- Aligned to your judgement: Our alignment loop finds edge cases, compares LLM vs. human preference, and iterates until your judge matches how you would evaluate
- Automatic synthetic data: Build robust datasets for evals and training as you go — ready when you save
- No prompt engineering required: Describe your concerns in plain English; the copilot turns them into accurate, judge-able evals
- Guided start-to-finish: Subject matter experts can build rigorous evals without managing each step or looping in data scientists
- 5x faster: Create a complete Spec in ~5 minutes vs. 30+ minutes for traditional evals
Bonus Feature: Fine-tuning for Tool Calling
Our latest release also includes the ability to fine-tune a model for calling a specific set of tools. The fine-tuned models can improve over the base model by:
- Learn when to call each tool (and when not to)
- Choose the right tool for each situation
- Format tool calls correctly — reducing errors and making smaller, cheaper models viable
Together, this means you can improve agent performance and lower costs.
Want Our Help Building, Evaluating and Optimizing Your AI system?
If you're part of an enterprise and are looking for help building better AI systems, please reach out! We're available to help companies improve their AI systems, collaboration workflows, build evals, and generate synthetic datasets.
