Back to docs

UCB Algorithm

What is multi-armed bandit?

The multi-armed bandit problem is about finding the best option (arm) with limited trials. In AdBandit, each arm represents a different creative angle. UCB1 balances exploration and exploitation automatically.

Understanding UCB1 score

UCB1 score is computed as "average reward + exploration bonus". Arms with fewer trials receive more bonus, so untested arms are tried first.

Reward weight adjustment

You can adjust reward weights (α, β, γ) for impressions, clicks, and conversions. The default is α=0.1, β=0.3, γ=0.6.

Pruning mechanism

Among sufficiently tested arms, those below 50% of overall average reward are pruned. Mutant arms are then added to keep exploration space.