Walk1D is a discrete time continuous state-action MDP that models a one-dimensional random walk on the real line. The state starts at `s=1`

. At each step, the agent chooses an action `a::Float64`

and the state transitions deterministically to the next state `s'=s+a`

. The agent samples actions from an action proposal distribution `Normal(0.0,1.0)`

. The step reward is `log(pdf(Normal(0.0,1.0), a))`

. The episode ends either when the agent steps out of bounds, given by a symmetric threshold (default: +10 and -10), i.e., `abs(x) > 10.0`

, or after a fixed number of steps (default: 20). If the episode ends and the agent did not escape, a reward of `-d_miss`

is given, where `d_miss`

is given by `thresh_x - abs(x)`

. Because the initial state is slightly positive, it is better to head towards the positive boundary; and aim to end the episode as soon as possible since each step incurs additional cost. However, there is an optimal step size that maximizes the distance traveled vs. the cost of the step. The optimal path involves the agent taking `a ~= 1.5`

for 6 consecutive steps.

02/27/2018

over 2 years ago

10 commits