Skip to content

How Scanning Works

A scan in HarborGuard is one row in the scans table that fans out to one or more scanner processes, then reconverges into a single normalized result. This page walks the pipeline end to end.

Pipeline

POST /api/scans          Trigger (UI, API, schedule, or webhook)
        |
        v
PENDING                  Row created, queued
        |
        v
IN_PROGRESS              Trivy / Grype / Syft / Dockle / OSV / Dive run in parallel
        |
        v
Ingestion + dedup        Per-engine output normalized into one envelope,
                         CVEs from multiple engines collapsed by (cve, package, version)
        |
        v
COMPLETED                Findings, packages, layers, logs persisted
        |
        v
Post-processing          Grade computed, SLA timers started, notifications dispatched

Scanner suite

Each scan runs any subset of these six engines. They are not redundant - each has a category it owns. See Scanner Reference for the breakdown.

EngineCategoryOutput
TrivyOS + language vulnsCVEs, misconfigs
GrypeOS + language vulnsCVEs (different matcher than Trivy)
SyftSBOMPackage inventory
DockleImage configCIS-aligned misconfigs
OSV-ScannerLanguage vulnsOSV / GHSA advisories
DiveLayer analysisLayer breakdown, wasted bytes

Dedup model

When multiple engines report the same CVE on the same package version, the ingestion layer collapses them. The result is one finding row whose sources field is the union of reporting engines:

{
  "cve": "CVE-2024-1234",
  "package": "openssl",
  "version": "3.0.2",
  "severity": "HIGH",
  "sources": ["trivy", "grype", "osv"]
}

This means a CVE with sources.length >= 2 has corroboration. A CVE only one engine flags is still surfaced but is a useful triage signal - look at it more skeptically.

Origins

A scan's origin is one of:

  • cloud - HarborGuard executes the scanners on managed infrastructure.
  • sensor - One of your own sensors claims the job and runs it locally. The image bytes never leave your network.

Origin is determined by the registry's scanning policy. Cloud scans are queued for immediate execution. Sensor scans stay PENDING until a sensor bound to that registry polls GET /api/agent/jobs and atomically claims the job, so multiple sensor replicas can run safely in parallel.

Triggers

TriggerMechanism
ManualUI button or POST /api/scans
ScheduledPer-registry policy: daily, weekly, on_push, or manual
Catalog pushNew tag detected during sync
Re-scanTriggered when a triaged finding's evidence changes

See Scheduling and CI/CD for trigger details.

What you read after the scan completes

On this page