Introducing OwlPack: SLMs That Scan And Fix Your Codebase While You Sleep

Designed for a vibecoding world

May 21, 2026

We have over 150 fine tuned models in our SLM Marketplace, and over the last 6 weeks we noticed that 2 of our code testing SLMs were the most popular downloads. We took that as a sign and decided to bundle a few coding SLMs together in a product we are announcing today - Owlpack — a GitHub App that runs five specialist agents against your repos while you sleep.

Built on 5 Small Models

When we say Small Language Model, we mean models under 20B parameters. Some are under 3B. They’re small enough to run on a single GPU, fast enough to return structured output in seconds, and cheap enough that running five of them in parallel against thousands of files is a rounding error, not a line item.

The catch — and it’s a real one — is that they don’t know everything. A 3B model is not going to write your novel, debate Kant, or replace Opus. But it doesn’t need to. It needs to find SQL injections. Or flag a deprecated dependency. Or notice that a function has crept past 400 lines.

Narrow the task, and small wins.

The SLMs That Compose Owlpack

Owlpack runs five agents every night, each focused on a different domain:

Hunter scans for security vulnerabilities — SQLi, SSRF, auth bypasses, leaked secrets, cross-referenced against live CVE feeds.
Tracker hunts bugs — null references, race conditions, off-by-one errors, suspicious test coverage gaps.
Keeper audits dependencies — outdated packages, breaking changes, deprecation notices, abandoned libraries.
Mason identifies refactoring opportunities — duplication, coupling hotspots, long functions, outdated patterns.
Scribe analyzes the codebase itself — churn, review latency, complexity drift, module ownership.

Each one is a specialist. Each one runs in parallel. Each one returns structured findings that get deduplicated, diffed against history, and ranked by severity before they land in your inbox by morning.

Try doing that with a single frontier model. The math falls apart. A nightly full-repo scan, across thousands of users, calling GPT-class inference five times per repo, would cost more than most teams pay for their entire dev tooling stack. That’s why no one was offering this service. The unit economics didn’t work.

With SLMs, they do.

The case for specialization

There’s a deeper reason we built it this way, and it goes beyond cost.

When you fine-tune a small model on a specific task — say, identifying CVE patterns in JavaScript — it gets better at that task than a frontier model trained to do everything. Specialists beat generalists when the task has a defined shape. Code review is a defined shape. Dependency auditing is a defined shape. Detecting a leaked AWS key is a very defined shape.

A frontier model brings a trillion-plus parameters of knowledge about Roman history and SQL injection patterns. Hunter brings the SQL injection patterns. For this job, that’s the better tool.

It also means the agents don’t drift. They don’t get creative. They don’t hallucinate a function name that doesn’t exist in your repo because they read about it in a blog post once. Narrowness is a feature.

Privacy as a byproduct

There’s a third benefit that falls out of using SLMs: a smaller blast radius. Owlpack clones your repository at scan time, runs the agents, and deletes the clone. We don’t need a 1.5T-parameter foundation model to read your code. We need five tight, task-specific models that do their job and forget. That architecture is easier to audit, easier to contain, and easier to deploy in environments where data exfiltration risk actually matters.

What this looks like in practice

You install the GitHub App. You pick which repos to scan. We clone them at your chosen time, run all five agents in parallel, delete the clone, and deliver a structured briefing by morning. If there’s nothing new, we don’t email you. If there’s a CVE in a transitive dependency, you’ll know before standup.

The whole pipeline costs a fraction of what a single frontier call would. That’s what passes through to pricing. That’s what makes the service viable.

The big-model era taught the industry to reach for the biggest hammer in the room. The next era is about picking the right one. Five small, sharp tools, running every night — that’s what Owlpack is. SLMs are why it works.

Try Owlpack Free For 7 Days. Or email us sales@neurometric.ai if you want to try a team plan.

neurometric’s Substack

Discussion about this post

Ready for more?