Skip to content

An OpenAI gym environment for ad serving algorithms

License

Notifications You must be signed in to change notification settings

falox/gym-adserver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status codecov PyPI version shields.io

gym-adserver

gym-adserver is an OpenAI Gym environment for reinforcement learning-based online advertising algorithms. gym-adserver is one of the official OpenAI environments.

The AdServer environment implements a typical multi-armed bandit scenario where an ad server agent must select the best advertisement (ad) to be displayed in a web page.

Each time an ad is selected, it is counted as one impression. A displayed ad can be clicked (reward = 1) or not (reward = 0), depending on the interest of the user. The agent must maximize the overall click-through rate.

OpenAI Environment Attributes

Attribute Value Notes
Action Space Discrete(n) n is the number of ads to choose from
Observation Space Box(0, +inf, (2, n)) Number of impressions and clicks for each ad
Actions [0...n] Index of the selected ad
Rewards 0, 1 1 = clicked, 0 = not clicked
Render Modes 'human' Displays the agent's performance graphically

Installation

You can download the source code and install the dependencies with:

git clone https://github.com/falox/gym-adserver
cd gym-adserver
pip install -e .

Alternatively, you can install gym-adserver as a pip package:

pip install gym-adserver

Basic Usage

You can test the environment by running one of the built-in agents:

python gym_adserver/agents/ucb1_agent.py --num_ads 10 --impressions 10000

Or comparing multiple agents (defined in compare_agents.py):

python gym_adserver/wrappers/compare_agents.py --num_ads 10 --impressions 10000

The environent will generate 10 (num_ads) ads with different performance rates and the agent, without prior knowledge, will learn to select the most performant ones. The simulation will last 10000 iterations (impressions).

A window will open and show the agent's performance and the environment's state:

Performance Dashboard

The overall CTR increases over time as the agent learns what the best actions are.

During the initialization, the environment assigns to each ad a "Probability" to be clicked. Such a probability is known by the environment only and will be used to draw the rewards during the simulation. The "Actual CTR" is the CTR actually occurred during the simulation: with time, it approximates the probability.

The effective agent will give most impressions to the most performant ads.

Built-in Agents

The gym_adserver/agents directory contains a collection of agents implementing the following strategies:

Each agent has different parameters to adjust and optimize its performance.

You can use the built-in agents as a starting point to implement your own algorithm.

Unit Tests

You can run the unit test for the environment with:

pytest -v

Next Steps

  • Extend AdServer with the concepts of budget and bid
  • Extend AdServer to change the ad performance over time (currently the CTR is constant)
  • Implement Q-learning agents
  • Implement a meta-agent that exploits multiple sub-agents with different algorithms
  • Implement epsilon-Greedy variants