Implementing a Linear Regression & Multi-Armed Bandit

Fuma
Apr 6, 2021
1 min read

I'm currently following this course on Udemy:

https://www.udemy.com/course/artificial-intelligence-reinforcement-learning-in-python/

Great intermediate course, with enough maths and a lot of work you have to implement yourself. I've decided to share the very first few implementations, a simple Linear Regression implemented step by step without fancy libraries and a couple of approaches to the Explore/Exploit tradeoff (epsilon-greedy, epsilon-greedy with decaying epsilon, optimistic initial values and UCB-1).

I hadn't seen yet UCB-1 (Upper Confidence Bound, 1 stands for the choice of function), but in practice it's pretty simple: you act greedily, but instead of just basing your choice on the average value of the bandits you also add a bonus for the algorithms that have been chosen less often. This bonus is based on the concept of inequality, specifically Hoeffding's Inequality, and it ends up being:

Harder to explain than to compute, it's the last example in the Colab below :)

Implementing a Linear Regression & Multi-Armed Bandit

Recent Posts

Comments

Subscribe Form