2. Core architecture
| Status | Stable |
| Version | 2.0.0 |
| Last updated | 2026-01-31 |
| Authors | OpenALBA Working Group |
2.1 System components
The OpenALBA processing pipeline consists of the following components:
Processing Pipeline
DATA SOURCES → AGGREGATION → BASELINES → ANOMALY DETECTION → RISK SCORING → ALERTS
│ │ │ │ │ │
OTel Pre-agg Per-entity Statistical + Multipliers Routing
Traces Metrics + Peer group ML methods + Decay + DeliveryEach component has distinct responsibilities:
- Data Sources: Applications instrumented with OpenTelemetry SDK emit traces, metrics, and logs
- Aggregation: Raw observability data is pre-aggregated into metrics per entity per time window
- Baselines: Statistical and ML models establish “normal” behavior per entity and peer group
- Anomaly Detection: Current behavior is compared to baselines to calculate objective anomaly scores
- Risk Scoring: Anomaly scores are adjusted by entity criticality, sensitivity, and consumer-specific weights
- Alerts: Risk scores exceeding thresholds are routed to appropriate teams
2.2 Entity model
OpenALBA profiles behavior for multiple entity types:
| Entity Type | Primary Key | Baseline Scope | Typical Metrics |
|---|---|---|---|
| User | user.id | Per-user + peer group | Request volume, endpoints, data volume, timing |
| Session | session.id | Per-user historical | Duration, actions, navigation pattern |
| Service | service.name | Per-service + type | Request rate, error rate, latency, dependencies |
| Endpoint | service.name + http.route | Per-endpoint | Volume, response size, error rate |
| Dependency | service.name + peer.service | Per-pair | Call volume, error rate, latency |
2.3 Time windows
OpenALBA uses multiple time windows to balance responsiveness with accuracy:
| Window | Purpose | Typical Duration |
|---|---|---|
| Detection Window | Current behavior measurement | 5-15 minutes |
| Short Baseline | Recent “normal” | 24-72 hours |
| Standard Baseline | Primary behavioral baseline | 7-30 days |
| Long Baseline | Seasonal patterns | 90-365 days |
Note
The detection window SHOULD be short enough to catch attacks but long enough to avoid noise from brief fluctuations. Implementations SHOULD allow this to be configured per detection pattern.
2.4 Data flow
Data flows through the system in the following sequence:
- Applications emit spans and metrics via OpenTelemetry SDK to an OTel Collector
- Collector exports data to storage (e.g., ClickHouse) via appropriate exporters
- Aggregation jobs run periodically to compute per-entity metrics from raw spans
- Baseline jobs update statistical models and ML models on configured schedules
- Detection jobs compare current metrics to baselines and calculate anomaly scores
- Risk scoring applies multipliers and decay to produce final risk scores
- Alert evaluation checks thresholds and routes notifications to configured channels
2.5 Conformance
Implementations claiming conformance to the architecture:
- MUST support the User and Service entity types
- SHOULD support Session, Endpoint, and Dependency entity types
- MUST implement at least detection and short baseline windows
- SHOULD implement standard and long baseline windows for improved accuracy
Tip
Continue to Section 3: Baseline Methodology for details on how baselines are established.