Abstractions, algorithms, and utilities for reinforcement learning in Julia

First Commit


Last Touched

20 days ago

Commit Count

85 commits



Build Status Gitter

Reinforce.jl is an interface for Reinforcement Learning. It is intended to connect modular environments, policies, and solvers with a simple interface.

Packages which build on Reinforce:

New environments are created by subtyping AbstractEnvironment and implementing a few methods:

  • reset!(env)
  • actions(env, s) --> A
  • step!(env, s, a) --> r, s′
  • finished(env, s′)

and optional overrides:

  • state(env) --> s
  • reward(env) --> r

which map to env.state and env.reward respectively when unset.

  • ismdp(env) --> bool

An environment may be fully observable (MDP) or partially observable (POMDP). In the case of a partially observable environment, the state s is really an observation o. To maintain consistency, we call everything a state, and assume that an environment is free to maintain additional (unobserved) internal state. The ismdp query returns true when the environment is MDP, and false otherwise.

TODO: more details and examples

Agents/policies are created by subtyping AbstractPolicy and implementing action. The built-in random policy is a short example:

type RandomPolicy <: AbstractPolicy end
action(policy::RandomPolicy, r, s′, A′) = rand(A′)

The action method maps the last reward and current state to the next chosen action: (r, s′) --> a′.

Iterate through episodes using the Episode iterator. A 4-tuple (s,a,r,s′) is returned from each step of the episode:

ep = Episode(env, policy)
for (s, a, r, s′) in ep
    # do some custom processing of the sars-tuple
R = ep.total_reward
T = ep.niter

There is also a convenience method run_episode. The following is an equivalent method to the last example:

R = run_episode(env, policy) do
    # anything you want... this section is called after each step

Author: Tom Breloff