Skip to content
Last updated July 2, 2026

Alerts & Incidents

Alerts is the default landing page for Operations and the favourite pin in the sidebar. It is where you define what “unhealthy” means for your fleet, watch things fire in real time, acknowledge and resolve incidents, and confirm that notifications actually made it out the door.

Open Alerts in the sidebar. The page is tabbed:

TabWhat’s in it
RulesEvery alert rule you have defined — source, target, threshold, channel, enabled toggle
EventsThe live stream of firing events, grouped by rule + target so a flapping host does not drown the list
IncidentsCorrelated incidents (same server, same severity, same title collapsed into one row) with status and last update
NotificationsEvery attempted delivery — Slack, Discord, Email, Webhook, PagerDuty — with success / failure and the response body from the receiver

Filters at the top of each tab: server, severity, source type, time window.

A rule watches one source and points at one target:

SourceWhat it watches
Server metricCPU, RAM, disk (per mount point), load, network, process counts, custom metrics
AnomalyThe anomaly detector’s output for a specific server
Uptime checkHTTP / TCP / ICMP monitor result
HeartbeatA named heartbeat missed its cadence
Backup monitorA watched backup path is stale, missing, or shrinking
CertificateSSL certificate approaching expiry or invalid
Log patternA regex hit in tailed log lines

You can start from a blank rule or pick a template (Disk 90 %, RAM 95 %, backup missed, SSL 14 days, and so on). Every rule has: name, severity (info, warn, crit), evaluation window, dedupe window, and one or more notification channels.

Disk alerts support per-mount-point targeting — you can alert on /var at 85 % without noise from a healthy /.

An event is a single firing (crossed threshold, missed heartbeat, failed check). Events are grouped in the UI so you see one row per problem, not one row per evaluation cycle.

An incident collects related events for a server into a single record you can:

  • Acknowledge — takes ownership and silences reminders while you investigate.
  • Resolve — closes the incident and clears the badge on the server row.
  • Comment — inline notes appear on the incident timeline.
  • Attach a postmortem — once resolved, an incident can carry a structured writeup for retros.

Bulk actions on the Incidents tab let you resolve or delete a filtered selection at once.

Channels are managed under Settings → Notification Channels and picked per rule. Supported types:

  • Email — one or many recipients, HTML template.
  • Slack — incoming webhook, includes severity colour + Ack link.
  • Discord — webhook with embed and severity colour.
  • Telegram — bot token + chat ID.
  • Generic Webhook — raw JSON POST, HMAC-signed with the channel’s signing secret. Format presets are available for Slack, Discord, Microsoft Teams MessageCards, and PagerDuty Events API v2 (routing key goes in the signing-secret field).

Every delivery attempt lands on the Notifications tab with the receiver’s response — the first thing to check when someone says “I didn’t get the alert.”

  • Metric rules are evaluated on each agent metrics push (typically every 15 seconds). A rule fires when the threshold has been continuously breached for its evaluation window.
  • Uptime, heartbeat, backup, and certificate rules are evaluated by the HostAtlas backend on their own cadence.
  • Firings are deduplicated — the same event won’t spam you every cycle; you get one notification, then reminders per the rule’s cadence until acknowledged or resolved.
  • Maintenance windows suppress notifications for their scope while active. See Maintenance.
  • Recovery rules can react to a firing by restarting a service or running a recipe before the operator is paged. See Recovery rules.
  • Maintenance — silence alerts during planned work.
  • Recovery rules — auto-remediate before paging a human.
  • Recipes — the scripts recovery rules and automations run.
Was this page helpful?