Getting started
This guide covers the prerequisites, initial setup, and basic configuration for implementing OpenALBA.
Prerequisites
Before implementing OpenALBA, ensure you have:
- OpenTelemetry instrumentation deployed in your applications, with traces exported to a collector
- Time-series storage for observability data (ClickHouse recommended, but Prometheus, TimescaleDB, or other options work)
- Compute resources for running ALBA processing jobs (Kubernetes CronJobs recommended)
- Alerting infrastructure for routing notifications (PagerDuty, Slack, email)
1. Verify instrumentation
OpenALBA requires specific OpenTelemetry attributes to function. Verify your applications emit the required attributes:
# Resource attributes
service.name: "my-service"
service.version: "1.2.3"
deployment.environment.name: "production"
# Span attributes
http.route: "/api/users/:id"
http.request.method: "GET"
http.response.status_code: 200
client.address: "203.0.113.42"Tip
For enhanced detection, add identity attributes like user.id and session.id to your spans. See Signal Definitions for the complete list.
2. Configure storage
Set up tables for raw data, aggregated metrics, and ALBA-specific data. Example ClickHouse schema:
-- Raw traces (7-day retention)
CREATE TABLE traces_raw (
timestamp DateTime64(9),
trace_id String,
span_id String,
service_name LowCardinality(String),
http_route LowCardinality(String),
http_status_code UInt16,
user_id String,
client_address String,
duration_ms Float64
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (service_name, timestamp)
TTL timestamp + INTERVAL 7 DAY;
-- User metrics (hourly aggregation, 90-day retention)
CREATE TABLE user_metrics_hourly (
hour DateTime,
user_id String,
request_count UInt64,
unique_endpoints UInt32,
error_count UInt64,
response_bytes_total UInt64
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (user_id, hour)
TTL hour + INTERVAL 90 DAY;3. Establish baselines
Before detecting anomalies, ALBA needs baseline data. Run the baseline job with historical data:
# Run baseline initialization with 14 days of historical data
alba baseline init --window 14d
# Or run incrementally on new data
alba baseline update --incrementalWarning
Baselines require sufficient sample data to be meaningful. Entities with fewer than 100 samples will use fallback baselines (peer group or population). Allow 2-4 weeks of data collection for optimal accuracy.
4. Enable detection
Configure which detection patterns to enable and their thresholds:
detection:
patterns:
- name: user_request_volume_anomaly
enabled: true
zscore_threshold: 2.5
- name: geographic_anomaly
enabled: true
new_country_score: 60
impossible_travel_score: 90
- name: service_error_rate_anomaly
enabled: true
zscore_threshold: 2.5
min_volume: 100
schedule:
interval: 5m
timeout: 2m5. Configure alerting
Route alerts to appropriate teams based on anomaly type and risk:
alerting:
routes:
- name: security
conditions:
anomaly_types: [geographic, credential_stuffing, data_exfil]
risk_score_min: 50
channels:
critical: [pagerduty:security, slack:#security-critical]
high: [slack:#security-alerts]
- name: sre
conditions:
anomaly_types: [error_rate, latency, dependency_failure]
risk_score_min: 50
channels:
critical: [pagerduty:sre, slack:#incidents]
high: [slack:#sre-alerts]Next steps
- Read the full specification for detailed methodology
- Explore detection patterns to enable specific detections
- Contribute to OpenALBA if you find issues or have improvements
Last updated: 2026-01-31