BLOG

Git-Backed SaaS: SaaS Without the Service

Kiln uses git for everything you'd normally need a database for, with the instant-syncing user-experience of SaaS. Every API call is a commit. Every read is a local file.

Kiln is a local app. Your datasets, evals, and runs live in your git repo, on your machine, under your control. But hit save, and your teammate's UI updates in seconds. No cloud service, no terminal, no git commit, no pull, no merge conflicts. It uses git for everything you'd normally need a database for, with the easy user-experience of SaaS.

Every API call is a commit. Every read is a local file.

A save in one client shows up in a teammate's UI seconds later β€” no terminal or git knowledge required.

In this post:

Why we did this

I love git for data. History, branches, blame, PRs, portability. For a data team, it's hard to beat. You get version control, rollback, and an audit trail without building any of it.

But Kiln's bigger users hit a wall. Less technical folks like PMs, QA, and subject-matter experts needed in on the workflow. And "just use git" doesn't work for people who don't know what git is.

We had a choice. We could build a hosted service with a database and standard SaaS auth. That's the normal move. But it throws away everything good about git: data ownership, the audit trail, portability, the access-management infrastructure your org already runs. Your data goes from "files you control" to "rows in our database, accessible through our API, exportable if we feel like building that."

I decided to challenge us: could we get SaaS-grade sync UX without giving up the benefits of git? The answer is yes, git can be your database. You inherit versioning, access control, audit trails, CI, and portability instead of building them. The hard part is making git invisible for less technical users, and that's what the rest of this post covers.

How it works

Four pieces. Everything below uses pygit2 (libgit2 with Python bindings). No shelling out to git. Users don't need git installed, just our app.

Instant local reads, in sync with remote

Each project lives as a full clone on disk, in a hidden directory Kiln manages. A read hits a JSON file on local disk. To keep in sync, a background loop polls the remote every 10 seconds and fast-forwards the local clone when there's anything new. The read path itself doesn't touch the network and feels like an instant local app.

The poll loop doesn't fight with writes. It runs in two phases. First, a lock-free fetch that only updates remote-tracking refs and never touches the working tree. Second, only if phase one found new commits, a fast-forward that briefly takes the write lock to move HEAD. In the common case (nothing changed remotely), the loop adds zero contention.

Reads can be instant because all they need to do is check a single timestamp: if the last successful sync was within 15 seconds, serve from local; otherwise, fall back to a synchronous fetch (rare). The system polls actively while users are working and pauses when they are idle for over 5 minutes.

How do you avoid conflicts?

Best way to handle conflicts is not to have them, so that's the first plan of attack.

Data model design. Kiln stores data as folders of small JSON files. Each entity gets its own file, named with a unique random ID. Most files are immutable. Most operations are append-only. Random IDs ensure we don't conflict on write and paths don't change when users rename things. Two users rarely write the same file at the same time because there are many small entities, not a few large ones.

Isolated clone. Kiln keeps its own clone in a hidden directory, separate from any checkout an engineer might have. Any local work you do won't conflict with Kiln's auto-sync system, and the two are kept in sync via remote.

Write lock. Every mutating request gets a per-repo write lock. Endpoints don't know it exists. More on this in the middleware section.

Background sync narrows the window. With a 10-second poll cadence, two clients would have to make conflicting edits to the same file within a few seconds to collide. The window is small, and conflicts are exceedingly unlikely.

Uncommon isn't zero, though. Sometimes the race happens.

What if you get a conflict anyway?

When conflicts happen, the push fails because the remote moved since our last fetch. Here's the recovery.

Cherry-pick the one commit. Each API call produces one commit, so recovery is a one-commit rebase. We do it as a cherry-pick: fetch the new remote HEAD, hard-reset local to match it, then replay our commit on top.

fetch remote
local_commit = HEAD
reset --hard to remote HEAD
cherry-pick local_commit onto new HEAD
if conflicts:
    abort cherry-pick
    reset --hard to pre-request HEAD
    return 409 to client
push

One retry, then stop. If the cherry-pick is clean, we push once more. If that push also fails, we stop. No retry loop. This is deliberate.

Under contention, retries make things worse: each attempt races against other writers, and the losing retry creates another orphaned commit to clean up. One attempt gives you a clear signal. Either the rebase landed or it didn't. The client retries with fresh state, which is almost always enough.

If cherry-pick conflicts or the retry fails: rollback. Rolling back to remote HEAD ensures we always get back in sync with remote, never requiring manual merging from the user. Remote wins conflicts, and your local will automatically align.

While this sounds scary, it isn't too bad in practice because:

  • Likelihood of conflicts is very low: we're within 10s of remote, and the data model is designed to avoid conflicts.
  • In the UI the user sees "There was a problem saving, please try again", just like they would for a network issue. The user can retry saving in a click (the UI still has the data to retry the request), and typically the second attempt succeeds since their local repo is now up to date.
  • We always keep a backup of the data: we roll back using git stash, so every change is preserved locally, even if it didn't make it into the branch. We haven't built a UX around it, but it's there for emergencies.

All in all, it's an extra level of robustness over the "server blip dropped my save" errors you get in a SaaS.

Crash recovery uses the same recipe. If the process dies mid-write (OOM, power loss, kill -9), the repo is left dirty. The next write request detects this automatically. It aborts any in-progress rebase first (because git stash fails on a conflicted index), stashes all dirty files, then resets any unpushed local commits back to the remote HEAD.

Same stash-and-reset recipe as conflict rollback, same safety properties. Nothing is lost: every dirty file lands in the stash list, every orphaned commit stays in the reflog. The system self-heals. Non-technical users never see "fix your repo state."

API middleware: zero changes to existing code

My favourite part of this design is that we didn't need to re-write our app to adopt it. I added an HTTP middleware around every request, which takes care of acquiring the write lock and committing the changes.

  • The API code doesn't need to change: it just reads and writes local files as it always did.
  • Commits are atomic at API level: an API might expect to create 9 files which depend on each other. Either they all go, or they all roll back with an error.
  • HTTP method automatically determines locking. POST, PUT, PATCH, DELETE auto-acquire the write lock. GET passes straight through.

The API endpoint code runs, writes whatever files it wants, and exits. On the way out, the middleware checks git status. If anything is dirty, it commits and pushes. If nothing changed, it releases the lock and moves on.

git status is the source of truth.

This works because of the isolated clone. The only process writing to this repo is Kiln. No one's editor is leaving .DS_Store files or auto-saving drafts in there. So anything git status reports as dirty was written by the current request. Period. It's also simple: no internal file-tracking dictionary that could drift out of sync with reality.

Here's the full request lifecycle the middleware runs for a mutating call like POST /api/projects/{id}/tasks:

POST /api/projects/{id}/tasks
πŸ”’ Write lock held
Pre-flight middleware
Acquire write lock30s timeout
ensure_clean()
if dirty β†’
self-heals Crash recovery β€” stash + reset
ensure_fresh()
if stale β†’
catch up Fetch + fast-forward from remote
Snapshot HEADrollback point
API Handler unchanged API code
App API Code writes files like always, knows nothing about git
Commit & sync middleware
Buffer response bodyheld until commit confirmed
Check git status
if clean β†’
fast path Nothing to commit β€” release lock, return 200
if dirty
git add -A Β· commit Β· push
push fails β†’
one retry Cherry-pick rebase, then retry push
still fails ↓
remote wins Roll back to remote, return 409 β€” client retries
βœ“ Release lock β€” return response
The middleware request lifecycle. Everything inside the lock is git plumbing the middleware adds automatically β€” the green API Handler box is your unchanged API code.

The response body is buffered until the middleware confirms the commit. The client never sees a 200 for a save that didn't persist.

Decorators for exceptions. A @write_lock decorator handles the rare GET that mutates. A @no_write_lock decorator handles long-running or expensive endpoints that need to opt out.

@app.get(".../run_evals")
@no_write_lock
async def run_evals(request: Request, ...):
    for task in tasks:
        result = await expensive_model_call()   # no lock held
        run = EvalRun(result=result, ...)
        async with save_context():              # lock, commit and push
            run.save_to_file()

In the example above we run longer compute tasks outside the lock, then acquire the lock to save. Each eval result becomes its own commit. Between save scopes the lock is free, so the project isn't frozen while a long task runs.

Dev-mode safety nets. In dev, we detect mistakes like a GET that secretly writes a file and return a loud 500 so it can't be missed.

What you get for free

By putting application data in git, you inherit infrastructure you'd otherwise have to build.

Access management. Read-only users, branch protection rules, signed commits, required reviews. All standard git provider features, all applicable to your application data without writing a line of access-control code.

CI on data. Run validators or eval suites on every pull request. Catch regressions before they merge. Your existing CI system will work, and you can extend as much or as little as you need.

Branches as environments. Auto-sync users onto a dedicated branch. Validate changes there. Merge to main with a PR when you're confident. You get a review workflow without building one.

Audit log and blame. Every change is attributable, dated, reversible. Not because we built an audit system, but because that's what git does.

Portability. Your data isn't trapped behind our API. The repo is the export. You already own and control it.

These aren't features Kiln built. They're features Kiln users inherit because the data is in git.

We made git invisible without making it less powerful.

Bringing your team in

Onboarding looks similar to any modern app. The main difference is you see your Git provider's OAuth instead of the app's own login. Even folks who have never used a terminal before can use this without any training.

Not GitHub-only. The GitHub OAuth flow is the most polished path, but a PAT-token flow works with any git host: GitLab, Bitbucket, self-hosted. We have some custom deeplinks to the token generation pages of major git providers to make it easy. And if you already have SSH keys set up, it just works.

The honest tradeoff. Non-technical users need a git provider account they might not have. But most enterprises already provision these centrally. The admins are already in that loop. And it's infrastructure your org already trusts, already runs SSO on, already has security review for. Compare that to vetting yet another vendor's proprietary login system.

Tradeoffs

Nothing is free. Here are the pros and cons of each approach.

Traditional SaaSSelf-hosted SaaSGit-backed
Data ownershipVendor's databaseYour servers, vendor's formatYour git repo, open format
Access controlVendor's auth systemVendor's auth, your serverYour git provider, already configured
Audit logBuild it, or buy itBuild it, or buy itIncluded, it's git history
Read speedNetwork round-tripNetwork round-tripInstant, local file
SetupSign upDeploy and maintain a serverCreate a repo, set permissions
Offline read-onlyNoNoPossible (but not yet implemented)
Provider can access dataYesNoNo

What works well:

  • Ownership. Data lives on your infrastructure, in a format you control.
  • Enterprise-ready. Your git provider is already approved, SSO is already configured, permissions are already fine-grained.
  • Instant local reads. No round-trip on the read path.
  • Mix and match. PMs use the UI. Engineers drop into the terminal. Both write to the same repo.
  • Audit and reversibility. Built-in, free, no configuration.

What costs you:

  • More setup than typical SaaS. Someone needs to create the initial repo and configure permissions. Not hard, but not zero.
  • Non-technical users need a git account. Mitigated by the OAuth flow and enterprise provisioning, but it's still a step.
  • May still need a service. Features like our realtime AI agent can't be magically provided over git. Git only works for your product's data (for Kiln: datasets, evals, runs, feedback). In Kiln, these are optional add-on features you get by connecting a Kiln account.

Neutral:

  • No offline mode (yet). When auto-sync is enabled, we block offline usage. We chose this deliberately: blocking is better than silently accumulating conflicts that can't auto-merge. But a SaaS doesn't work offline either! We could allow offline read-only mode, but haven't implemented that yet.

Why not X?

Why not self-hosted SaaS? For users: setup, management, security review of another vendor's server running inside your infrastructure. For us: it's a pain to build, package, maintain, and update. Plus, it's less powerful than git for the workflows we get for free: branch protection, signed commits, CI on data changes, fine-grained permissions, and enterprises already trust their git provider.

Why not CRDTs? CRDTs could help eliminate conflicts where two people edit the same document simultaneously, but that's rare for us. A CRDT operation log would break most git workflows, which isn't worth it for a rare conflict case we already mitigate.

Why not offline-first? We are offline-first! Git-sync is an additional opt-in feature for teams that want sync.

What's next

Will this be enough for everyone, or will we end up shipping a hosted SaaS too? Honestly: we don't know yet. It's possible that some teams will want us to manage the infrastructure. We'll see what users tell us.

What we do know: the pattern works. Git can replace a SaaS database for most operations. You get versioning, access control, audit trails, CI integration, and portability without building any of it. The hard part is making it invisible.

Try Git-Backed Sync in Kiln

Git-backed sync is built into Kiln today. Create a repo, invite your team, and your datasets, evals, and runs stay in sync β€” with full version history, access control, and an audit trail you didn't have to build. Download the latest release to try it, and check out the git-auto-sync PR if you want to see how it's wired up.

Jump to section
Newsletter

New posts in your inbox.

Build AI that actually works.

Ship custom AI products with evals, fine-tuning, and prompt optimization built in.

macOS, Windows, and Linux