2. Core architecture

StatusStable
Version2.0.0
Last updated2026-01-31
AuthorsOpenALBA Working Group

2.1 System components

The OpenALBA processing pipeline consists of the following components:

Processing Pipeline
DATA SOURCES → AGGREGATION → BASELINES → ANOMALY DETECTION → RISK SCORING → ALERTS
     │              │            │              │                 │            │
   OTel         Pre-agg      Per-entity    Statistical +       Multipliers   Routing
   Traces       Metrics      + Peer group     ML methods       + Decay       + Delivery

Each component has distinct responsibilities:

  • Data Sources: Applications instrumented with OpenTelemetry SDK emit traces, metrics, and logs
  • Aggregation: Raw observability data is pre-aggregated into metrics per entity per time window
  • Baselines: Statistical and ML models establish “normal” behavior per entity and peer group
  • Anomaly Detection: Current behavior is compared to baselines to calculate objective anomaly scores
  • Risk Scoring: Anomaly scores are adjusted by entity criticality, sensitivity, and consumer-specific weights
  • Alerts: Risk scores exceeding thresholds are routed to appropriate teams

2.2 Entity model

OpenALBA profiles behavior for multiple entity types:

Entity TypePrimary KeyBaseline ScopeTypical Metrics
Useruser.idPer-user + peer groupRequest volume, endpoints, data volume, timing
Sessionsession.idPer-user historicalDuration, actions, navigation pattern
Serviceservice.namePer-service + typeRequest rate, error rate, latency, dependencies
Endpointservice.name + http.routePer-endpointVolume, response size, error rate
Dependencyservice.name + peer.servicePer-pairCall volume, error rate, latency

2.3 Time windows

OpenALBA uses multiple time windows to balance responsiveness with accuracy:

WindowPurposeTypical Duration
Detection WindowCurrent behavior measurement5-15 minutes
Short BaselineRecent “normal”24-72 hours
Standard BaselinePrimary behavioral baseline7-30 days
Long BaselineSeasonal patterns90-365 days

Note

The detection window SHOULD be short enough to catch attacks but long enough to avoid noise from brief fluctuations. Implementations SHOULD allow this to be configured per detection pattern.

2.4 Data flow

Data flows through the system in the following sequence:

  1. Applications emit spans and metrics via OpenTelemetry SDK to an OTel Collector
  2. Collector exports data to storage (e.g., ClickHouse) via appropriate exporters
  3. Aggregation jobs run periodically to compute per-entity metrics from raw spans
  4. Baseline jobs update statistical models and ML models on configured schedules
  5. Detection jobs compare current metrics to baselines and calculate anomaly scores
  6. Risk scoring applies multipliers and decay to produce final risk scores
  7. Alert evaluation checks thresholds and routes notifications to configured channels

2.5 Conformance

Implementations claiming conformance to the architecture:

  • MUST support the User and Service entity types
  • SHOULD support Session, Endpoint, and Dependency entity types
  • MUST implement at least detection and short baseline windows
  • SHOULD implement standard and long baseline windows for improved accuracy

Tip

Continue to Section 3: Baseline Methodology for details on how baselines are established.