Last updated July 2, 2026

Alerts & Incidents

Alerts is the default landing page for Operations and the favourite pin in the sidebar. It is where you define what “unhealthy” means for your fleet, watch things fire in real time, acknowledge and resolve incidents, and confirm that notifications actually made it out the door.

What you see

Open Alerts in the sidebar. The page is tabbed:

Tab	What’s in it
Rules	Every alert rule you have defined — source, target, threshold, channel, enabled toggle
Events	The live stream of firing events, grouped by rule + target so a flapping host does not drown the list
Incidents	Correlated incidents (same server, same severity, same title collapsed into one row) with status and last update
Notifications	Every attempted delivery — Slack, Discord, Email, Webhook, PagerDuty — with success / failure and the response body from the receiver

Filters at the top of each tab: server, severity, source type, time window.

Alert rules

A rule watches one source and points at one target:

Source	What it watches
Server metric	CPU, RAM, disk (per mount point), load, network, process counts, custom metrics
Anomaly	The anomaly detector’s output for a specific server
Uptime check	HTTP / TCP / ICMP monitor result
Heartbeat	A named heartbeat missed its cadence
Backup monitor	A watched backup path is stale, missing, or shrinking
Certificate	SSL certificate approaching expiry or invalid
Log pattern	A regex hit in tailed log lines

You can start from a blank rule or pick a template (Disk 90 %, RAM 95 %, backup missed, SSL 14 days, and so on). Every rule has: name, severity (info, warn, crit), evaluation window, dedupe window, and one or more notification channels.

Disk alerts support per-mount-point targeting — you can alert on /var at 85 % without noise from a healthy /.

Events and incidents

An event is a single firing (crossed threshold, missed heartbeat, failed check). Events are grouped in the UI so you see one row per problem, not one row per evaluation cycle.

An incident collects related events for a server into a single record you can:

Acknowledge — takes ownership and silences reminders while you investigate.
Resolve — closes the incident and clears the badge on the server row.
Comment — inline notes appear on the incident timeline.
Attach a postmortem — once resolved, an incident can carry a structured writeup for retros.

Bulk actions on the Incidents tab let you resolve or delete a filtered selection at once.

Notification channels

Channels are managed under Settings → Notification Channels and picked per rule. Supported types:

Email — one or many recipients, HTML template.
Slack — incoming webhook, includes severity colour + Ack link.
Discord — webhook with embed and severity colour.
Telegram — bot token + chat ID.
Generic Webhook — raw JSON POST, HMAC-signed with the channel’s signing secret. Format presets are available for Slack, Discord, Microsoft Teams MessageCards, and PagerDuty Events API v2 (routing key goes in the signing-secret field).

Every delivery attempt lands on the Notifications tab with the receiver’s response — the first thing to check when someone says “I didn’t get the alert.”

How it works

Metric rules are evaluated on each agent metrics push (typically every 15 seconds). A rule fires when the threshold has been continuously breached for its evaluation window.
Uptime, heartbeat, backup, and certificate rules are evaluated by the HostAtlas backend on their own cadence.
Firings are deduplicated — the same event won’t spam you every cycle; you get one notification, then reminders per the rule’s cadence until acknowledged or resolved.
Maintenance windows suppress notifications for their scope while active. See Maintenance.
Recovery rules can react to a firing by restarting a service or running a recipe before the operator is paged. See Recovery rules.

Maintenance — silence alerts during planned work.
Recovery rules — auto-remediate before paging a human.
Recipes — the scripts recovery rules and automations run.

Was this page helpful?

Report issue ↗