4. Anomaly score calculation

StatusStable
Version2.0.0
Last updated2026-01-31
AuthorsOpenALBA Working Group

4.1 Overview

The Anomaly Score quantifies how unusual an observed behavior is compared to established baselines. This section defines the calculation methodology.

The anomaly score is an objective, mathematical measure in the range [0, 100] that does not consider business context or risk implications. Those factors are applied in the Risk Score calculation.

4.2 Conformance

Implementations MUST:

  • Calculate anomaly scores in the range [0, 100]
  • Implement at least one statistical deviation method from Section 4.3
  • Apply component weights as defined in Section 4.6

Implementations SHOULD:

  • Implement confidence adjustment for entities with limited baseline data
  • Support configurable component weights

4.3 Deviation component (0-100)

Measures statistical distance from the baseline.

4.3.1 Z-Score method

For normally distributed metrics:

Z-Score Calculationpython
z = (observed - baseline.mean) / baseline.stddev
deviation_score = min(100, |z| × 20)

Mapping:
    z = 0 (at mean) → score = 0
    z = 2.5 → score = 50
    z = 5 → score = 100

4.3.2 Modified Z-Score method (RECOMMENDED)

For metrics with outliers in the baseline:

Modified Z-Score Calculationpython
modified_z = 0.6745 × (observed - baseline.median) / baseline.MAD
deviation_score = min(100, |modified_z| × 18)

Where MAD is the Median Absolute Deviation.

Tip

The modified Z-score is RECOMMENDED for most use cases as it is more resistant to outliers in the baseline data.

4.3.3 IQR method

For non-normal distributions:

IQR Methodpython
IQR = baseline.q3 - baseline.q1
lower_fence = baseline.q1 - 1.5 × IQR
upper_fence = baseline.q3 + 1.5 × IQR

if observed < lower_fence:
    distance = (lower_fence - observed) / IQR
elif observed > upper_fence:
    distance = (observed - upper_fence) / IQR
else:
    distance = 0

deviation_score = min(100, distance × 30)

4.4 Rarity component (0-100)

Measures how uncommon the observed value is historically.

4.4.1 Percentile-based rarity

Percentile Raritypython
percentile = percentile_rank(observed, baseline.distribution)

if percentile <= 50:
    rarity_score = (1 - percentile/50) × 100  # Lower is rarer
else:
    rarity_score = ((percentile - 50)/50) × 100  # Higher is rarer

Example:
    Observed at 5th percentile → rarity = 90
    Observed at 50th percentile → rarity = 0
    Observed at 95th percentile → rarity = 90

4.4.2 Frequency-based rarity

For categorical values:

Frequency Raritypython
frequency = baseline.value_frequencies.get(observed, 0)
total = sum(baseline.value_frequencies.values())
rarity_score = (1 - frequency / total) × 100

Example:
    Value seen 1000 times out of 10000 → rarity = 90
    Value never seen before → rarity = 100

4.5 Velocity component (0-100)

Measures rate of change from the previous period.

4.5.1 Simple rate of change

Simple Velocitypython
if previous == 0:
    rate_of_change = infinity if current > 0 else 0
else:
    rate_of_change = (current - previous) / previous

velocity_score = min(100, |rate_of_change| × 50)

Mapping:
    0% change → score = 0
    50% change → score = 25
    100% change (doubled/halved) → score = 50
    200% change → score = 100

4.5.2 Baseline-normalized velocity

Baseline-Normalized Velocitypython
change = |current - previous|
normalized_change = change / baseline.stddev
velocity_score = min(100, normalized_change × 25)

Mapping:
    0 stddev change → score = 0
    2 stddev change → score = 50
    4+ stddev change → score = 100

4.6 Persistence component (0-100)

Measures duration of anomalous state.

4.6.1 Consecutive periods

Consecutive Persistencepython
consecutive = count of consecutive periods where score > threshold
persistence_score = min(100, consecutive × 10)

Example:
    Threshold: 40
    Period 1: score=55 → consecutive=1 → persistence=10
    Period 2: score=52 → consecutive=2 → persistence=20
    Period 3: score=48 → consecutive=3 → persistence=30
    Period 4: score=30 → consecutive=0 → persistence=0 (reset)

4.6.2 Weighted persistence

Weighted Persistencepython
weighted_sum = sum(recent_scores)
max_possible = N × 100
persistence_score = weighted_sum / max_possible × 100

Example (last 5 periods):
    Scores: [45, 52, 48, 55, 50]
    Sum: 250, Max: 500
    Persistence: 250/500 × 100 = 50

4.7 Composite score calculation

4.7.1 Standard formula

Standard Composite Formulapython
AnomalyScore = 0.40×Deviation + 0.25×Rarity + 0.20×Velocity + 0.15×Persistence

Example:
    Deviation: 65 (z-score of 3.25)
    Rarity: 80 (bottom 20% of distribution)
    Velocity: 40 (1.6 stddev change)
    Persistence: 30 (3 consecutive periods)

    Score = 0.40×65 + 0.25×80 + 0.20×40 + 0.15×30
          = 26 + 20 + 8 + 4.5
          = 58.5

4.7.2 Detection-specific weights

Detection-Specific Weightsyaml
volumetric_anomaly:  # DDoS/abuse - sudden spikes matter
  deviation: 0.50
  rarity: 0.15
  velocity: 0.30
  persistence: 0.05

access_pattern:  # Scraping - pattern matters most
  deviation: 0.25
  rarity: 0.45
  velocity: 0.15
  persistence: 0.15

data_exfiltration:  # Slow leaks - persistence matters
  deviation: 0.30
  rarity: 0.20
  velocity: 0.10
  persistence: 0.40

geographic:  # Rare events matter most
  deviation: 0.20
  rarity: 0.50
  velocity: 0.20
  persistence: 0.10

4.7.3 Multi-signal aggregation

When an entity has multiple anomalous metrics:

Multi-Signal Aggregationpython
Method: Maximum + Breadth Bonus (RECOMMENDED)
    base_score = max(individual_scores)
    breadth_bonus = min(20, count(scores > 40) × 5)
    entity_score = min(100, base_score + breadth_bonus)

Example:
    Request volume: 55
    Unique endpoints: 45
    Geographic: 75

    Base: 75
    Breadth: 3 signals > 40 → bonus = 15
    Entity score: 75 + 15 = 90

4.7.4 Confidence adjustment

Confidence Adjustmentpython
adjusted_score = raw_score × sqrt(confidence)

Example:
    Raw score: 70
    Confidence: 0.25 (new entity)

    Adjusted: 70 × sqrt(0.25) = 70 × 0.5 = 35

Rationale:
    sqrt() provides gradual adjustment
    At confidence 0.25 → 50% of score
    At confidence 0.64 → 80% of score
    At confidence 1.0 → 100% of score

4.8 Security considerations

Anomaly scores may be manipulated by adversaries who gradually shift behavior to poison baselines. Implementations SHOULD:

  • Use robust statistics (median, MAD) resistant to outliers
  • Implement baseline poisoning detection
  • Maintain historical baselines for comparison
  • Use slow adaptation rates (low alpha in EWMA) for user behavior

Note

Continue to Section 5: Risk Score Calculation for details on applying context and multipliers.