8. Alert framework

Status	Stable
Version	2.0.0
Last updated	2026-01-31
Authors	OpenALBA Working Group

8.1 Alert tiers

Alerts are categorized into tiers based on risk score, with corresponding response SLAs:

Tier	Risk Score	Response SLA	Notification	Auto-Escalate
Critical	90-100	15 min	PagerDuty + Slack	30 min
High	70-89	1 hour	Slack + Email	2 hours
Medium	40-69	8 hours	Slack + Digest	24 hours
Low	20-39	24 hours	Email digest	None
Info	0-19	Weekly	Dashboard	None

8.2 Alert content structure

Alerts SHOULD include sufficient context for investigation without requiring additional queries:

Alert Content Structureyaml

header:
  alert_id: "ALBA-2024-02-15-001234"
  severity: "critical"
  title: "Unusual data access for user john.doe@company.com"
  timestamp: "2024-02-15T14:32:00Z"

scores:
  anomaly_score: 78
  risk_scores:
    security: 85
    ops: 35
    engineering: 40
  confidence: 0.92

entity:
  type: "user"
  id: "john.doe@company.com"
  attributes:
    role: "engineer"
    department: "Product"
  criticality: 1.0

anomaly:
  type: "user_data_access_volume_anomaly"
  description: "User accessed 15,000 customer records in past hour"
  signals:
    - metric: "data_records_accessed"
      current: 15000
      baseline_mean: 120
      zscore: 330.67
  baseline_comparison: "125x normal volume"

context:
  timeline:
    - "14:00 - Normal login"
    - "14:05 - First bulk export endpoint access"
    - "14:10-14:30 - 150 export requests"
  related_entities:
    - "Endpoint: /api/customers/export"
  suppression_status: "None"
  change_windows: "None active"

investigation:
  suggested_queries:
    - "All requests by user in last 24h"
    - "All users who accessed bulk export today"
    - "User's HR status"
  dashboards: ["User Activity", "Data Access"]
  runbook: "https://wiki/runbooks/data-exfiltration"

actions: ["acknowledge", "suppress", "escalate", "resolve"]

8.3 Alert routing

Alerts are routed based on anomaly type and risk score thresholds:

Alert Routing Rulesyaml

security_team:
  conditions:
    - anomaly_type_in: [geographic, impossible_travel, credential_stuffing,
                        account_takeover, privilege_escalation, data_exfil,
                        new_external_connection]
    - risk_score.security >= 50
  channels:
    critical: [pagerduty:security, slack:#security-critical]
    high: [slack:#security-alerts]
    medium: [email:security@company.com]

sre_team:
  conditions:
    - anomaly_type_in: [error_rate, latency, dependency_failure,
                        traffic_anomaly, capacity, certificate]
    - risk_score.ops >= 50
  channels:
    critical: [pagerduty:sre, slack:#incidents]
    high: [slack:#sre-alerts]
    medium: [email:sre@company.com]

engineering:
  conditions:
    - anomaly_type_in: [new_exception, response_anomaly, api_violation]
    - risk_score.engineering >= 50
  channels:
    critical: [slack:#eng-oncall]
    high: [slack:#eng-alerts]
    medium: [email:eng-leads@company.com]

8.4 Aggregation and deduplication

Aggregation Rulesyaml

aggregation:
  same_entity_type:
    group_by: [entity_type, entity_id, anomaly_type]
    window: 15 minutes
    action: "Merge into single alert"

  service_incident:
    group_by: [service.name]
    window: 30 minutes
    conditions:
      anomaly_types: [error_rate, latency, dependency_failure]
    action: "Create incident linking all anomalies"

  attack_campaign:
    group_by: [anomaly_type]
    window: 1 hour
    conditions:
      affected_entities: "> 10"
    action: "Create campaign alert"

deduplication:
  ongoing_anomaly:
    conditions: [same_entity, same_type, previous_alert_open]
    action: "Update existing, don't create new"

  flapping_prevention:
    conditions:
      - same_entity
      - same_type
      - alert_count > 3 in 1 hour
    action: "Suppress, create meta-alert"

8.5 Feedback loop

Alert feedback improves detection accuracy over time:

Feedback Loopyaml

false_positive:
  actions:
    - Log with context
    - Suggest suppression rule if pattern
    - Adjust baseline if appropriate
  threshold_adjustment:
    after_3_fps: "Widen threshold 10%"
    after_5_fps: "Flag for rule review"

true_positive:
  actions:
    - Link to incident ticket
    - Update risk multipliers (+5%)
    - Add to ML training data

metrics_targets:
  false_positive_rate: "< 5%"
  true_positive_rate: "> 80%"
  mean_time_to_acknowledge: "< SLA"

8.6 Conformance

Implementations:

MUST support at least three alert severity tiers
SHOULD include anomaly scores and context in alerts
SHOULD implement alert deduplication
MAY implement feedback loops for continuous improvement

Note

Continue to Section 9: Implementation Guidance for deployment architecture and operational guidance.