8. Alert framework
| Status | Stable |
| Version | 2.0.0 |
| Last updated | 2026-01-31 |
| Authors | OpenALBA Working Group |
8.1 Alert tiers
Alerts are categorized into tiers based on risk score, with corresponding response SLAs:
| Tier | Risk Score | Response SLA | Notification | Auto-Escalate |
|---|---|---|---|---|
| Critical | 90-100 | 15 min | PagerDuty + Slack | 30 min |
| High | 70-89 | 1 hour | Slack + Email | 2 hours |
| Medium | 40-69 | 8 hours | Slack + Digest | 24 hours |
| Low | 20-39 | 24 hours | Email digest | None |
| Info | 0-19 | Weekly | Dashboard | None |
8.2 Alert content structure
Alerts SHOULD include sufficient context for investigation without requiring additional queries:
Alert Content Structureyaml
header:
alert_id: "ALBA-2024-02-15-001234"
severity: "critical"
title: "Unusual data access for user john.doe@company.com"
timestamp: "2024-02-15T14:32:00Z"
scores:
anomaly_score: 78
risk_scores:
security: 85
ops: 35
engineering: 40
confidence: 0.92
entity:
type: "user"
id: "john.doe@company.com"
attributes:
role: "engineer"
department: "Product"
criticality: 1.0
anomaly:
type: "user_data_access_volume_anomaly"
description: "User accessed 15,000 customer records in past hour"
signals:
- metric: "data_records_accessed"
current: 15000
baseline_mean: 120
zscore: 330.67
baseline_comparison: "125x normal volume"
context:
timeline:
- "14:00 - Normal login"
- "14:05 - First bulk export endpoint access"
- "14:10-14:30 - 150 export requests"
related_entities:
- "Endpoint: /api/customers/export"
suppression_status: "None"
change_windows: "None active"
investigation:
suggested_queries:
- "All requests by user in last 24h"
- "All users who accessed bulk export today"
- "User's HR status"
dashboards: ["User Activity", "Data Access"]
runbook: "https://wiki/runbooks/data-exfiltration"
actions: ["acknowledge", "suppress", "escalate", "resolve"]8.3 Alert routing
Alerts are routed based on anomaly type and risk score thresholds:
Alert Routing Rulesyaml
security_team:
conditions:
- anomaly_type_in: [geographic, impossible_travel, credential_stuffing,
account_takeover, privilege_escalation, data_exfil,
new_external_connection]
- risk_score.security >= 50
channels:
critical: [pagerduty:security, slack:#security-critical]
high: [slack:#security-alerts]
medium: [email:security@company.com]
sre_team:
conditions:
- anomaly_type_in: [error_rate, latency, dependency_failure,
traffic_anomaly, capacity, certificate]
- risk_score.ops >= 50
channels:
critical: [pagerduty:sre, slack:#incidents]
high: [slack:#sre-alerts]
medium: [email:sre@company.com]
engineering:
conditions:
- anomaly_type_in: [new_exception, response_anomaly, api_violation]
- risk_score.engineering >= 50
channels:
critical: [slack:#eng-oncall]
high: [slack:#eng-alerts]
medium: [email:eng-leads@company.com]8.4 Aggregation and deduplication
Aggregation Rulesyaml
aggregation:
same_entity_type:
group_by: [entity_type, entity_id, anomaly_type]
window: 15 minutes
action: "Merge into single alert"
service_incident:
group_by: [service.name]
window: 30 minutes
conditions:
anomaly_types: [error_rate, latency, dependency_failure]
action: "Create incident linking all anomalies"
attack_campaign:
group_by: [anomaly_type]
window: 1 hour
conditions:
affected_entities: "> 10"
action: "Create campaign alert"
deduplication:
ongoing_anomaly:
conditions: [same_entity, same_type, previous_alert_open]
action: "Update existing, don't create new"
flapping_prevention:
conditions:
- same_entity
- same_type
- alert_count > 3 in 1 hour
action: "Suppress, create meta-alert"8.5 Feedback loop
Alert feedback improves detection accuracy over time:
Feedback Loopyaml
false_positive:
actions:
- Log with context
- Suggest suppression rule if pattern
- Adjust baseline if appropriate
threshold_adjustment:
after_3_fps: "Widen threshold 10%"
after_5_fps: "Flag for rule review"
true_positive:
actions:
- Link to incident ticket
- Update risk multipliers (+5%)
- Add to ML training data
metrics_targets:
false_positive_rate: "< 5%"
true_positive_rate: "> 80%"
mean_time_to_acknowledge: "< SLA"8.6 Conformance
Implementations:
- MUST support at least three alert severity tiers
- SHOULD include anomaly scores and context in alerts
- SHOULD implement alert deduplication
- MAY implement feedback loops for continuous improvement
Note
Continue to Section 9: Implementation Guidance for deployment architecture and operational guidance.