Skip to content

Multi-Armed Bandits

Understand how Dalton's dynamic traffic routing works.

What Are Multi-Armed Bandits?

Multi-armed bandits (MAB) are an optimization algorithm that dynamically allocates traffic to better-performing variants while still collecting data from all options.

The name: Imagine a gambler at a casino facing multiple slot machines ("one-armed bandits"). The gambler must decide which machines to play to maximize winnings while learning which machines pay out best. That's the multi-armed bandit problem.

Traditional A/B Testing vs. Multi-Armed Bandits

Traditional A/B TestingMulti-Armed Bandits (Dalton)
50/50 fixed splitDynamic traffic adjustment
3+ months for significance2-4 weeks for insights
Losing variants get 50% trafficWinners get more traffic automatically
Manual stopping decisionsAlgorithm optimizes in real-time

How It Works

Phase 1: Exploration (Week 1-2)

Traffic is split relatively evenly to gather initial data on all variants.

Example:

  • Control: 33%
  • Variant A: 33%
  • Variant B: 34%

The algorithm is "exploring" to learn which variants perform best.

Phase 2: Exploitation (Week 2+)

As data accumulates, the algorithm shifts traffic toward better performers while still monitoring all variants.

Example:

  • Control: 15%
  • Variant A: 65% (clear winner)
  • Variant B: 20%

The algorithm is "exploiting" knowledge while continuing to "explore" for confirmation.

Phase 3: Convergence (Week 4+)

The winning variant gets the majority of traffic. Other variants receive enough traffic to confirm they're not catching up.

Example:

  • Control: 10%
  • Variant A: 80% (confirmed winner)
  • Variant B: 10%

Why This Matters

Fewer Lost Conversions

Traditional tests show losing variants to 50% of visitors for months. MAB minimizes this.

Example: If Variant A converts at 5% and Control converts at 3%, traditional A/B testing shows the inferior control to 50% of visitors for the entire test. MAB quickly shifts traffic to Variant A, capturing more conversions during the test period.

More Experiments

Faster results mean you can run more experiments per year.

  • Traditional: 4 experiments/year (3 months each)
  • MAB: 12+ experiments/year (2-4 weeks each)

Traffic Requirements

Lower Traffic Threshold

Traditional A/B testing needs 100K+ sessions/month, making it impractical for most businesses.

Multi-armed bandits lower this to just 5K sessions/month minimum (optimal at 10K+).

With lower traffic:

  • Tests take 6-8 weeks instead of 2-4
  • Confidence builds more slowly

Understanding the Algorithm

Thompson Sampling

Dalton uses Thompson Sampling, a Bayesian algorithm that:

  1. Maintains a probability distribution for each variant's true conversion rate
  2. Samples from these distributions to decide traffic allocation
  3. Updates distributions as new data arrives
  4. Balances exploration (learning) with exploitation (optimizing)

You don't need to understand the math—just know it works better than fixed splits.

Common Questions

Is MAB less statistically rigorous than A/B testing? No. MAB reaches valid statistical conclusions—just faster. Traditional significance thresholds (95%) can still be applied, but directional decisions at 85-90% confidence are often sufficient for business purposes.

What if traffic is uneven—does that bias results? No. The algorithm accounts for uneven traffic when calculating significance. Uneven traffic is the point—it's how MAB optimizes.

What if I have 5+ variants? MAB handles multiple variants well. The algorithm will find the best performer(s) and allocate traffic accordingly.

When NOT to Use MAB

Not Right for Everyone

Consider traditional A/B testing when:

  • Your goal is pure learning or academic research
  • You need very high certainty (95%+) over ROI and number of experiments
  • You have extremely low traffic (<5K sessions/month)

For most business use cases focused on ROI and continuous optimization, MAB is superior.

Best Practices

  1. Let it run: Don't stop tests too early—give it 2-4 weeks minimum
  2. Trust the algorithm: Don't manually override traffic allocation
  3. Monitor trends, not daily changes: Traffic shifts are normal
  4. Combine with good hypotheses: MAB optimizes traffic, but you still need good variant ideas