top of page
Search

Implementing a Linear Regression & Multi-Armed Bandit

Writer's picture: FumaFuma

I'm currently following this course on Udemy:


Great intermediate course, with enough maths and a lot of work you have to implement yourself. I've decided to share the very first few implementations, a simple Linear Regression implemented step by step without fancy libraries and a couple of approaches to the Explore/Exploit tradeoff (epsilon-greedy, epsilon-greedy with decaying epsilon, optimistic initial values and UCB-1).


I hadn't seen yet UCB-1 (Upper Confidence Bound, 1 stands for the choice of function), but in practice it's pretty simple: you act greedily, but instead of just basing your choice on the average value of the bandits you also add a bonus for the algorithms that have been chosen less often. This bonus is based on the concept of inequality, specifically Hoeffding's Inequality, and it ends up being:




Harder to explain than to compute, it's the last example in the Colab below :)



1 view0 comments

Recent Posts

See All

Yorumlar


Post: Blog2_Post

Subscribe Form

Thanks for submitting!

©2021 by Fuma's Reinforcement Learning Hut. Proudly created with Wix.com

bottom of page