UCB Algorithm
What is multi-armed bandit?
The multi-armed bandit problem is about finding the best option (arm) with limited trials. In AdBandit, each arm represents a different creative angle. UCB1 balances exploration and exploitation automatically.
Understanding UCB1 score
UCB1 score is computed as "average reward + exploration bonus". Arms with fewer trials receive more bonus, so untested arms are tried first.
Reward weight adjustment
You can adjust reward weights (α, β, γ) for impressions, clicks, and conversions. The default is α=0.1, β=0.3, γ=0.6.
Pruning mechanism
Among sufficiently tested arms, those below 50% of overall average reward are pruned. Mutant arms are then added to keep exploration space.