Multi-Armed Bandits

Understand how Dalton's dynamic traffic routing works.

What Are Multi-Armed Bandits?

Multi-armed bandits (MAB) are an optimization algorithm that dynamically allocates traffic to better-performing variants while still collecting data from all options.

The name: Imagine a gambler at a casino facing multiple slot machines ("one-armed bandits"). The gambler must decide which machines to play to maximize winnings while learning which machines pay out best. That's the multi-armed bandit problem.

Traditional A/B Testing vs. Multi-Armed Bandits

Traditional A/B Testing	Multi-Armed Bandits (Dalton)
50/50 fixed split	Dynamic traffic adjustment
3+ months for significance	2-4 weeks for insights
Losing variants get 50% traffic	Winners get more traffic automatically
Manual stopping decisions	Algorithm optimizes in real-time

How It Works

Phase 1: Exploration (Week 1-2)

Traffic is split relatively evenly to gather initial data on all variants.

Example:

Control: 33%
Variant A: 33%
Variant B: 34%

The algorithm is "exploring" to learn which variants perform best.

Phase 2: Exploitation (Week 2+)

As data accumulates, the algorithm shifts traffic toward better performers while still monitoring all variants.

Example:

Control: 15%
Variant A: 65% (clear winner)
Variant B: 20%

The algorithm is "exploiting" knowledge while continuing to "explore" for confirmation.

Phase 3: Convergence (Week 4+)

The winning variant gets the majority of traffic. Other variants receive enough traffic to confirm they're not catching up.

Example:

Control: 10%
Variant A: 80% (confirmed winner)
Variant B: 10%

Why This Matters

Fewer Lost Conversions

Traditional tests show losing variants to 50% of visitors for months. MAB minimizes this.

Example: If Variant A converts at 5% and Control converts at 3%, traditional A/B testing shows the inferior control to 50% of visitors for the entire test. MAB quickly shifts traffic to Variant A, capturing more conversions during the test period.

More Experiments

Faster results mean you can run more experiments per year.

Traditional: 4 experiments/year (3 months each)
MAB: 12+ experiments/year (2-4 weeks each)

Traffic Requirements

Lower Traffic Threshold

Traditional A/B testing needs 100K+ sessions/month, making it impractical for most businesses.

Multi-armed bandits lower this to just 5K sessions/month minimum (optimal at 10K+).

With lower traffic:

Tests take 6-8 weeks instead of 2-4
Confidence builds more slowly

Understanding the Algorithm

Thompson Sampling

Dalton uses Thompson Sampling, a Bayesian algorithm that:

Maintains a probability distribution for each variant's true conversion rate
Samples from these distributions to decide traffic allocation
Updates distributions as new data arrives
Balances exploration (learning) with exploitation (optimizing)

You don't need to understand the math—just know it works better than fixed splits.

Common Questions

Is MAB less statistically rigorous than A/B testing? No. MAB reaches valid statistical conclusions—just faster. Traditional significance thresholds (95%) can still be applied, but directional decisions at 85-90% confidence are often sufficient for business purposes.

What if traffic is uneven—does that bias results? No. The algorithm accounts for uneven traffic when calculating significance. Uneven traffic is the point—it's how MAB optimizes.

What if I have 5+ variants? MAB handles multiple variants well. The algorithm will find the best performer(s) and allocate traffic accordingly.

When NOT to Use MAB

Not Right for Everyone

Consider traditional A/B testing when:

Your goal is pure learning or academic research
You need very high certainty (95%+) over ROI and number of experiments
You have extremely low traffic (<5K sessions/month)

For most business use cases focused on ROI and continuous optimization, MAB is superior.

Best Practices

Let it run: Don't stop tests too early—give it 2-4 weeks minimum
Trust the algorithm: Don't manually override traffic allocation
Monitor trends, not daily changes: Traffic shifts are normal
Combine with good hypotheses: MAB optimizes traffic, but you still need good variant ideas

Multi-Armed Bandits ​

What Are Multi-Armed Bandits? ​

Traditional A/B Testing vs. Multi-Armed Bandits ​

How It Works ​

Phase 1: Exploration (Week 1-2) ​

Phase 2: Exploitation (Week 2+) ​

Phase 3: Convergence (Week 4+) ​

Why This Matters ​

Fewer Lost Conversions ​

More Experiments ​

Traffic Requirements ​

Understanding the Algorithm ​

Thompson Sampling ​

Common Questions ​

When NOT to Use MAB ​

Best Practices ​

Multi-Armed Bandits

What Are Multi-Armed Bandits?

Traditional A/B Testing vs. Multi-Armed Bandits

How It Works

Phase 1: Exploration (Week 1-2)

Phase 2: Exploitation (Week 2+)

Phase 3: Convergence (Week 4+)

Why This Matters

Fewer Lost Conversions

More Experiments

Traffic Requirements

Understanding the Algorithm

Thompson Sampling

Common Questions

When NOT to Use MAB

Best Practices