Skip to content

Sensor Health

HarborGuard tracks sensor liveness through heartbeats. This page documents the actual timings and what each status means in user-facing terms.

Heartbeat cadence

A sensor sends POST /api/agent/heartbeat on a regular interval. Every successful heartbeat (and every job-poll, which also touches the heartbeat field) updates the sensor's lastHeartbeatAt.

SignalDefault interval
Heartbeat postevery ~30 seconds
Job pollevery ~30 seconds, or immediately after a job completes

You do not need to send both - any request authenticated as a sensor refreshes liveness.

Disconnect threshold

A sensor is considered offline when its lastHeartbeatAt is older than 2 minutes. The check runs continuously; the moment a sensor crosses the threshold it transitions to offline and the registry it was bound to is moved to ERROR so scheduled scans for that registry will fail fast instead of piling up.

So worst case: a sensor that dies silently is detected within 2 minutes. With a 30-second heartbeat, three consecutive missed heartbeats trip the threshold.

Status states

StatusMeaningWhen you see it
onlineHeartbeat received within the last 2 minutes; sensor idle and ready.Healthy idle sensor.
scanningSame liveness as online, but currently executing a job.A scan is in progress on this sensor.
offlineNo heartbeat for more than 2 minutes.Sensor crashed, host died, network broke, or process was stopped.

Transitions:

online <----> scanning
   |
   v
offline   (after >2 minutes without a heartbeat)

A sensor that comes back will go straight to online on its next heartbeat - there is no manual unstick step.

Notifications

When a sensor transitions to offline, HarborGuard fires a Sensor lost connection notification. Subscribed channels (email, Slack, webhooks, PagerDuty) receive it with the sensor name, last-heartbeat timestamp, and bound registry. Configure routing in Settings -> Notifications.

The corresponding event identifier in the API and webhook payloads is the human label "Sensor lost connection" - matching events also fire on registry status changes when the sensor's bound registry transitions to ERROR.

Dashboard view

Sensors in the left nav lists every registered sensor with:

  • Name and host
  • Status (online / scanning / offline)
  • Last heartbeat (relative time)
  • Version
  • Bound registry
  • Capabilities (scan, patch)
  • Recent jobs

Clicking through to a sensor shows its job history and registration audit trail.

Debugging a sensor that has gone offline

  1. Container running? docker ps | grep sensor or kubectl get pod -l app.kubernetes.io/name=harborguard-sensor. If not, see Docker or Kubernetes for restart guidance.
  2. Logs? Look for heartbeat ok lines. If they stop, look at what was logged immediately before. Common stoppers:
    • HTTP 401 - API key was rotated or revoked.
    • HTTP 5xx repeatedly - platform-side incident.
    • Connection refused / TLS - network egress blocked.
  3. Clock skew? A sensor whose system clock is more than a few minutes off may produce heartbeats that look stale on arrival. Run NTP.
  4. Registry binding lost? If the sensor's bound registry was deleted, it will keep heartbeating but never receive jobs. Re-register or rebind via the Sensors panel.
  5. Multiple replicas one node? Two pods on the same node sharing /var/run/docker.sock can deadlock under load. Use a Recreate strategy or pin one replica per node.

Forcing a re-register

If you change the API key or move the sensor to a new org, you can either:

  • Restart the sensor with the new credentials (it will re-register automatically), or
  • Delete the sensor from the Sensors panel and let the next heartbeat create a new row.

There is no manual force-re-register endpoint - restarting the process is the canonical answer.

On this page