How I built a CMMC-compliant SOC analyst assistant without sending data to an LLM

A common pitch in SOC tooling right now: bolt an LLM onto your alerts and let it summarize, recommend, and triage. Fast to demo. Cheap to build. Wrong, in my experience, for the specific environment I work in — multi-tenant MSSP, CMMC-regulated workloads, evidence-grade investigation notes.

CARL — the Contextual Alert Response Library — is the assistant I built instead. It runs entirely in the browser on the analyst workstation, makes no external calls, contains no model weights, and produces the same output for the same query every time. About 25,000 source lines, 500+ knowledge entries, eight routing engines. This post is the architecture walkthrough. The “why deterministic over LLM” argument is in an earlier post — here I want to explain how it’s actually built.

The problem with LLM-first in a CMMC environment

Three things make the default LLM-first pattern hard in this context:

Data egress. Alert payloads contain account names, IPs, file hashes, and free-text user descriptions. In a CMMC enclave or for a CUI-handling tenant, sending that to a hosted model creates a data-handling story I’d rather not have to defend. Local LLMs are possible but require infrastructure that isn’t always available on a per-analyst basis.
Audit trail. When an analyst closes an alert, the rationale needs to be reviewable. “The model said it was probably benign” is not a sentence that survives auditor scrutiny.
Reproducibility. Two analysts working the same alert type a week apart should get the same guidance. Model temperature and version drift work against that.

If any one of these were the only constraint, you could engineer around it. All three together push the architecture toward “no external calls, all logic auditable, knowledge base versioned in git.”

The constraint, and what falls out of it

State the constraint as plainly as possible: the tool runs on the analyst’s machine, never makes an outbound network request, and produces the same output every time you give it the same input.

A few things follow from that:

No model weights. The dist is 1.67MB — small enough to ship over Teams, drop into a sandboxed VDI, or load from a USB stick if a tenant requires air-gap.
No background sync, no telemetry. Nothing leaves the machine unless the analyst explicitly copies it out.
All knowledge is a file you can read. No vector database, no embedding store. The knowledge layer is a hand-curated set of structured entries (JSON-shaped) that get loaded into memory at startup.
All routing logic is JavaScript. Not opaque. The same input → same path through the resolver, every time.

There’s an optional enrichment proxy — a 768-line FastAPI service running locally on port 8088 — that the analyst can opt into for things like a WHOIS lookup or a public IP reputation check. The core tool doesn’t depend on it, and the proxy is a separate process that the analyst starts deliberately. The default mode is fully offline.

Architecture: eight engines, three phases

The query flow is three phases:

analyst pastes input → Extract → Classify Intent (P1–P7) → Dispatch to engine

Extract parses what the analyst pasted. Free-text question? Alert headers from Sentinel? A KQL fragment? A PowerShell command? The shape determines what comes next.

Classify intent routes to one of seven priority paths:

P1 — Direct lookup (alert type → playbook)
P1.5 — Knowledge question (“what does Atypical travel actually mean?”)
P2 — Attack pattern match (71 triggers, things like “powershell with -EncodedCommand”)
P3 — KQL pattern match (93 compound patterns: cross-table joins, time-window patterns, common false-positive filters)
P4 — LOLBins lookup (34 entries with flag databases — certutil, mshta, regsvr32, etc.)
P5 — File-path masquerading (40 known Windows path entries — when you see svchost.exe running out of C:\Users\..., that’s a hit)
P6 — Investigation primitive (parameterized atomic step — “pull sign-in logs for entity X over window Y”)
P7 — General FAQ

The eight engines are the dispatch targets. Each engine owns its own knowledge slice and its own response format. The PowerShell engine, for example, does flag parsing and Base64 UTF-16LE decoding on encoded commands — the kind of thing you can write a 200-line parser for and never need a model.

The knowledge base composes from named packs:

KR_PACK_MITRE — 100 ATT&CK techniques
KR_PACK_PLAYBOOKS — 11 alert playbooks, 5 investigation playbooks
KR_PACK_FAQ — 25 entries
KR_PACK_LOLBINS — 34 entries
KR_PACK_MASQ — 40 file-path entries

A pack is a versioned JSON object with a schema. Adding a new playbook is an edit to one file, a test, a commit. Reviewing what changed last week is a git log. There’s no “the model learned a new thing” mystery.

Before and after: one alert at contoso.com

Concrete example. An analyst pulls a phishing alert: [email protected] on the contoso.com tenant clicked a link in an email flagged by Defender as suspicious. The URL is a known credential-harvest staging domain. Time of click: 14:22 UTC.

Before CARL:

Open the alert in Sentinel.
Pivot to the user’s sign-in logs in the portal — manually filter by UPN and time range.
Open the playbook PDF in SharePoint, scroll to the phishing section.
Copy the recommended KQL into a new query window. Substitute [email protected] for the placeholder. Run.
Repeat for inbox-rule check, MFA registration check, lateral mail search.
Twelve to fifteen minutes before the first concrete answer, mostly burned on context-switching.

After CARL:

Paste the alert headers into CARL.
Routing extracts the alert type, identifies it as Phishing → Credential Harvest, and dispatches to the playbook engine.
CARL renders four investigation steps with KQL pre-substituted for [email protected] and the 6-hour window around 14:22:
1. Sign-in logs for the surrounding window
2. Inbox rules created after the first suspicious event
3. MFA registration events from the Entra audit log
4. Lateral mail scan — same sender pattern across all contoso.com mailboxes
Each KQL block has a copy button and a one-line explanation of what it’s checking.
Roughly ninety seconds to the first concrete answer.

The time savings are real, but the bigger win is consistency. The playbook is the same on Monday at 14:00 as it is on Wednesday at 03:00 with a tired analyst on shift. The recommendation is auditable: it came from playbook.phishing-cred-harvest.v3, which is a file in the repo, last edited on a specific commit by a specific person. If a client asks why we suggested the steps we suggested, the answer fits in a screenshot.

What you give up

Honest list:

Open-ended reasoning. If an alert doesn’t match any pattern in the knowledge base, CARL has nothing useful to say. An LLM would attempt a synthesis. Sometimes that synthesis would be useful, sometimes wrong. CARL just stays quiet.
Input flexibility. The PowerShell analyzer can handle a malformed -encodedcommand string, but only because I wrote a parser for that specific case. Variations the parser doesn’t recognize fall through.
Maintenance overhead. When attack TTPs shift or new alert types appear, someone has to update the knowledge base. That’s manual work. With an LLM, the model “knows” about new techniques as soon as its training data reflects them. With a curated knowledge base, “knowing” requires a commit.

These are real costs. For a hobbyist tool, they’d be disqualifying. For a SOC running CMMC workloads where the cost of an unauditable AI recommendation is higher than the cost of a slower update cycle, the trade looks different. The maintenance burden is the price of the audit trail. The narrowness of what CARL responds to is the price of “everything it says is defensible.”

Three things the architecture taught me

1. The knowledge base is the product. The engine is plumbing. I spent a lot of early effort on the routing logic. Eventually I realized the routing was easy and the knowledge curation was hard. The interesting question isn’t “how do I match this query” — it’s “what should the answer say, and is that the answer that holds up under audit?” An LLM-first design hides this work inside model weights, where it can’t be reviewed. Pulling it out into structured files made the maintenance burden visible — but also made it tractable.

2. Determinism forces clarity about coverage. With a deterministic engine, you can answer the question “what alerts is this tool useless for?” by running the corpus through it and listing the misses. That list is the roadmap. With an LLM, the equivalent is a vibes-based read of what the model gets wrong, which is a much harder thing to act on. The first time I generated a coverage gap list, I felt the difference immediately — I had a finite, prioritizable backlog instead of an open-ended quality concern.

3. The constraint is the feature. “No external calls” started as a compliance hedge. It turned out to be the thing that makes the tool deployable at all in tenants where any other architecture would have stalled in a security review. The constraint that looked most limiting up front ended up being the thing that opens the most doors.

I’d build it the same way again. If the prompt were ever “build something like this for a tenant where data egress isn’t a concern,” I’d still start with the deterministic engine and only reach for a model where it earned its place — which, in my experience, is a much smaller surface than the default discourse suggests.