Skip to content

Service-Level Objectives

Track reliability against a target. An SLO pairs a service-level indicator, the fraction of events that succeed, with an objective and a window. Oriel reports the error budget remaining and pages when that budget burns too fast.

Open SLOs, choose New SLO, and set:

FieldPurpose
NameDisplay name for the objective.
SLIAn OQL query that produces good and total counts.
ObjectiveThe target success ratio, entered as a percentage such as 99.9. Stored strictly between 0 and 1.
WindowThe rolling budget window, such as 30d.
Burn alertsOptional multi-window burn-rate alerts.

An SLO is scoped to the project and environment it is created in, and the SLI runs within that scope.

A service-level indicator is an OQL query that ends in one stats stage producing exactly two aggregations, good and total, with no grouping or binning:

spans | stats count_if(is_error == false) as good, count() as total

total counts the events in scope; good counts the ones that met the objective. Oriel bins the query by minute on your behalf, so the query must not add its own by clause or bin. Counts are stored rather than a ratio: counts sum across any window, and a ratio cannot.

Filter and shape the population with the rest of the pipeline:

spans | where http.route == "/checkout" | stats count_if(is_error == false) as good, count() as total
spans | stats count_if(http.status < 500) as good, count() as total

A window with no events reads as fully healthy, since no failures were observed. See Explore and OQL for the query language.

The error budget is 1 - objective: the share of events allowed to fail over the window. A 99.9% objective permits 0.1% failures.

Burn rate is the observed error ratio divided by the budget. A burn rate of 1 spends the whole budget over exactly the window; 14.4 spends it 14.4 times faster. The detail page shows the remaining budget, the current SLI, the burn rate, and the event count over the window, with the SLI plotted over time.

Each burn alert watches two windows and fires only when both exceed its multiplier, which catches a fast burn while ignoring a brief spike:

FieldDefaultPurpose
Long window1hThe slower confirmation window.
Short window5mThe faster reaction window; it cannot exceed the long window.
Multiplier14.4Burn-rate threshold both windows must clear before the alert fires.
SeveritypageOne of page, warn, or info.
ChannelsnoneWhere a firing alert delivers.

The 1h/5m pair at 14.4 is the classic fast-burn page: at that rate a 30-day budget is gone in about two days. Add a slower pair, such as 6h/30m at 6, for an earlier warning. Firing alerts deliver through the same channels as alert rules, with retries and backoff. See Alerts, Silences, and Channels for channel setup.

The list view summarizes each SLO as healthy, at risk, or breached, alongside its objective and window. The detail page shows the live error budget, current SLI, burn rate, and each burn alert’s long and short burn, plus an SLI timeseries. The status view refreshes periodically.

The worker rolls each enabled SLO’s SLI into one-minute buckets continuously. A newly created SLO backfills recent history, up to a day and bounded by its window, so its budget is meaningful soon after creation. Burn alerts evaluate every worker tick. Recording and evaluation run in the worker role, which is included in oriel serve --role=all.

PermissionAllows
slos:readList SLOs and read definitions, status, and history.
slos:manageCreate, edit, and delete SLOs and their burn alerts.

A malformed definition is rejected with ORL-7001; an evaluation failure surfaces as ORL-7002.