I'm currently following this course on Udemy:
Great intermediate course, with enough maths and a lot of work you have to implement yourself. I've decided to share the very first few implementations, a simple Linear Regression implemented step by step without fancy libraries and a couple of approaches to the Explore/Exploit tradeoff (epsilon-greedy, epsilon-greedy with decaying epsilon, optimistic initial values and UCB-1).
I hadn't seen yet UCB-1 (Upper Confidence Bound, 1 stands for the choice of function), but in practice it's pretty simple: you act greedily, but instead of just basing your choice on the average value of the bandits you also add a bonus for the algorithms that have been chosen less often. This bonus is based on the concept of inequality, specifically Hoeffding's Inequality, and it ends up being:
Harder to explain than to compute, it's the last example in the Colab below :)
Yorumlar