How Scanning Works

A scan in HarborGuard is one row in the scans table that fans out to one or more scanner processes, then reconverges into a single normalized result. This page walks the pipeline end to end.

Pipeline

POST /api/scans          Trigger (UI, API, schedule, or webhook)
        |
        v
PENDING                  Row created, queued
        |
        v
IN_PROGRESS              Trivy / Grype / Syft / Dockle / OSV / Dive run in parallel
        |
        v
Ingestion + dedup        Per-engine output normalized into one envelope,
                         CVEs from multiple engines collapsed by (cve, package, version)
        |
        v
COMPLETED                Findings, packages, layers, logs persisted
        |
        v
Post-processing          Grade computed, SLA timers started, notifications dispatched

Scanner suite

Each scan runs any subset of these six engines. They are not redundant - each has a category it owns. See Scanner Reference for the breakdown.

Engine	Category	Output
Trivy	OS + language vulns	CVEs, misconfigs
Grype	OS + language vulns	CVEs (different matcher than Trivy)
Syft	SBOM	Package inventory
Dockle	Image config	CIS-aligned misconfigs
OSV-Scanner	Language vulns	OSV / GHSA advisories
Dive	Layer analysis	Layer breakdown, wasted bytes

Dedup model

When multiple engines report the same CVE on the same package version, the ingestion layer collapses them. The result is one finding row whose sources field is the union of reporting engines:

{
  "cve": "CVE-2024-1234",
  "package": "openssl",
  "version": "3.0.2",
  "severity": "HIGH",
  "sources": ["trivy", "grype", "osv"]
}

This means a CVE with sources.length >= 2 has corroboration. A CVE only one engine flags is still surfaced but is a useful triage signal - look at it more skeptically.

Origins

A scan's origin is one of:

cloud - HarborGuard executes the scanners on managed infrastructure.
sensor - One of your own sensors claims the job and runs it locally. The image bytes never leave your network.

Origin is determined by the registry's scanning policy. Cloud scans are queued for immediate execution. Sensor scans stay PENDING until a sensor bound to that registry polls GET /api/agent/jobs and atomically claims the job, so multiple sensor replicas can run safely in parallel.

Triggers

Trigger	Mechanism
Manual	UI button or `POST /api/scans`
Scheduled	Per-registry policy: `daily`, `weekly`, `on_push`, or `manual`
Catalog push	New tag detected during sync
Re-scan	Triggered when a triaged finding's evidence changes

See Scheduling and CI/CD for trigger details.

Pipeline

Scanner suite

Dedup model

Origins

Triggers

What you read after the scan completes

Scanner Reference

Results

Grades

Scheduling

CI/CD

On this page