Skip to content
Architecture

How Klassify works.

Sovereign by design. Connectors push events, not content. Three classification tiers gate the LLM. Every action is HMAC-chained.

Connectors
M365
NetApp
S3
SharePoint
Windows agent
Email gateway
push file-change events ↓
Klassify backend (FastAPI)
Tier 1
filename · path · fingerprint
Tier 2
PII regex · lexicon
Tier 3
4-agent LLM panel + embeddings
cache hit fast-path → verdict ↓
Storage
Postgres
Qdrant
ClickHouse
Object storage
Outputs
Review queue
Compliance reports
Webhooks
SIEM

1. Connectors push events.

Klassify never bulk-pulls files. Connectors emit file-change events; the backend fetches only what it needs.

2. Cache fast-path.

Content-addressable hashing means re-seen files return a verdict in 5 ms without re-classification.

3. Tier 1 — fingerprint.

Filename, path, and known-fingerprint matches resolve 80%+ of files instantly.

4. Tier 2 — regex + lexicon.

PII patterns and policy lexicons catch the next slice without invoking the LLM.

5. Tier 3 — 4-agent panel.

The remaining ambiguous files go to PII Officer, National Security Reviewer, Legal Advisor, and Final Adjudicator. They deliberate, return reasoning in EN + AR.

6. Storage + outputs.

Verdicts persist to Postgres, embeddings to Qdrant, audit events to ClickHouse, large bodies to object storage. Webhooks and SIEM feeds fire downstream.

Deployment

On-prem (default), private cloud, air-gapped, or hybrid. Same code, same model, all runtimes.

Data residency

All data stays inside the customer perimeter. Only telemetry (opt-in) leaves.

High availability

Stateless workers, Postgres replication, multi-region read replicas. Blue/green upgrades.

Want the full architecture brief?