Displaying 1 to 6 from 6 results

dissecting-reinforcement-learning - Python code, PDFs and resources for the series of posts on Reinforcement Learning which I published on my personal blog

  •    Python

This repository contains the code and pdf of a series of blog post called "dissecting reinforcement learning" which I published on my blog mpatacchiola.io/blog. Moreover there are links to resources that can be useful for a reinforcement learning practitioner. If you have some good references which may be of interest please send me a pull request and I will integrate them in the README. The source code is contained in src with the name of the subfolders following the post number. In pdf there are the A3 documents of each post for offline reading. In images there are the raw svg file containing the images used in each post.

MAB - R package for Multi-Armed Bandit Simulation Study

  •    R

R package MAB is created to implement strategies for stationary and non-stationary multi-armed bandit problems. Various widely-used strategies and their ensembles are included in this package. This package is designed to compare different strategies in multi-armed bandit problems and help users to choose suitable strategies with suitable tuning parameters in different scenarios. This is not an official Google product.

BanditDungeon - Demo project using multi-armed bandit algorithm

  •    CSharp

Simple Unity project demonstrating the multi-armed bandit algorithm. In the simplest scenario, there is a single room that contains two chests. Opening a chest either yields a diamond (a good thing) or a ghost (a bad thing). Opening the same chest multiple times will yield a different sequence of diamonds and ghosts based on some underlying probability of yielding a diamond. For example, a chest that has a probability of 0.5 means that it will yield a 50-50 mix of diamonds and ghosts, while a probability of 0.9 means that it will yield a diamond nine out of every ten times, approximately. Note that each chest has its own true probability that the agent (in this case, the entity deciding which chest to open) is not aware of. Each time an agent selects a chest, they either receive a positive reward in the case of finding a diamond, or a negative reward in the case of finding a ghost. The goal of the agent is to maximize its total reward over a number of trials - in each trial the agent is allowed to select any chest.

myna-js - Javascript client for Myna

  •    CoffeeScript

Released under the BSD 3-clause license. See LICENSE.md for the full text. Myna for Javascript ("Myna JS") is a Javascript client library for the Myna A/B testing platform. It allows developers to quickly create A/B tests for rich, dynamic web applications.

slots - A multi-armed bandit library for Python

  •    Python

Slots is intended to be a basic, very easy-to-use multi-armed bandit library for Python. slots is a Python library designed to allow the user to explore and use simple multi-armed bandit (MAB) strategies. The basic concept behind the multi-armed bandit problem is that you are faced with n choices (e.g. slot machines, medicines, or UI/UX designs), each of which results in a "win" with some unknown probability. Multi-armed bandit strategies are designed to let you quickly determine which choice will yield the highest result over time, while reducing the number of tests (or arm pulls) needed to make this determination. Typically, MAB strategies attempt to strike a balance between "exploration", testing different arms in order to find the best, and "exploitation", using the best known choice. There are many variation of this problem, see here for more background.

softmax-js - A softmax multi-armed bandit algorithm

  •    Javascript

This implementation is based on Bandit Algorithms for Website Optimization and related empirical research in "Algorithms for the multi-armed bandit problem". In addition, this module conforms to the BanditLab/2.0 specification. This implementation often encounters extended floating point numbers. Arm selection is therefore subject to JavaScript's floating point precision limitations. For general information about floating point issues see the floating point guide.