Reinforce.jl is an interface for Reinforcement Learning. It is intended to connect modular environments, policies, and solvers with a simple interface.
Packages which build on Reinforce:
New environments are created by subtyping
AbstractEnvironment and implementing a few methods:
actions(env, s) --> A
step!(env, s, a) --> r, s′
and optional overrides:
state(env) --> s
reward(env) --> r
which map to
env.reward respectively when unset.
ismdp(env) --> bool
An environment may be fully observable (MDP) or partially observable (POMDP). In the case of a partially observable environment, the state
s is really an observation
o. To maintain consistency, we call everything a state, and assume that an environment is free to maintain additional (unobserved) internal state. The
ismdp query returns true when the environment is MDP, and false otherwise.
TODO: more details and examples
Agents/policies are created by subtyping
AbstractPolicy and implementing
action. The built-in random policy is a short example:
type RandomPolicy <: AbstractPolicy end action(policy::RandomPolicy, r, s′, A′) = rand(A′)
action method maps the last reward and current state to the next chosen action:
(r, s′) --> a′.
Iterate through episodes using the
Episode iterator. A 4-tuple
(s,a,r,s′) is returned from each step of the episode:
ep = Episode(env, policy) for (s, a, r, s′) in ep # do some custom processing of the sars-tuple end R = ep.total_reward T = ep.niter
There is also a convenience method
run_episode. The following is an equivalent method to the last example:
R = run_episode(env, policy) do # anything you want... this section is called after each step end
15 days ago