Skip to content

Alert rules

Alert rules define the conditions under which the monitoring system sends notifications. Each rule evaluates against one metric on a 30-second cycle.

Fires when a metric value crosses a fixed boundary — for example, heap usage above 85%, queue depth above 100, or error rate above 5%. You choose the operator (>, <, >=, <=, ==) and the value.

Use this for steady-state limits: things that should never exceed a known safe level.

Fires when a metric is changing faster than expected — for example, an error count that is rising three times faster than its recent baseline. You define the evaluation window over which the rate is computed.

Use this to catch sudden spikes early, before a threshold rule would trigger.

Fires when no data has arrived for a metric within the evaluation window — for example, a channel that has received zero messages in 30 minutes when it normally processes continuously.

Use this to detect stalled channels or broken upstream feeds.

Fires when a channel transitions between operational states — for example, a channel moving from Started to Stopped. It does not require a numeric threshold; the state transition itself is the condition.

Use this to get immediate notification when a channel unexpectedly stops or fails to deploy.

When you create a rule, you can leave the channel scope empty (applies to all channels) or target specific channels or channel groups. Group membership resolves dynamically — add a channel to a group and it is covered by any group-scoped rule on the next evaluation cycle, without re-saving the rule.

Server-level metrics (heap, disk, load, threads) cannot be scoped to individual channels and will be rejected if you try.

Every rule has a severity (Info, Warning, or Critical) that determines how it is displayed and which destinations it routes to. The cooldown setting (default 15 minutes) suppresses repeat firings of the same rule so you are not flooded during a sustained condition.

You can also silence a rule for a fixed duration — the rule continues to evaluate, but notifications are suppressed — useful during planned maintenance.

The rule drawer includes pre-built templates as starting points:

TemplateTypeCondition
High Error RateThresholdError rate > 5%
Channel StoppedState changeState → Stopped
Queue Depth WarningThresholdQueued > 100
No Messages (30 min)AbsenceNo data in 30 min
JVM Heap HighThresholdHeap > 85%
Disk Usage HighThresholdDisk > 85%
Error SpikeRate of changeErrors rising 3×

Select a template, adjust the values, assign destinations, and save.