Walk1D is a discrete time continuous state-action MDP that models a one-dimensional random walk on the real line. The state starts at s=1. At each step, the agent chooses an action a::Float64 and the state transitions deterministically to the next state s'=s+a. The agent samples actions from an action proposal distribution Normal(0.0,1.0). The step reward is log(pdf(Normal(0.0,1.0), a)). The episode ends either when the agent steps out of bounds, given by a symmetric threshold (default: +10 and -10), i.e., abs(x) > 10.0, or after a fixed number of steps (default: 20). If the episode ends and the agent did not escape, a reward of -d_miss is given, where d_miss is given by thresh_x - abs(x). Because the initial state is slightly positive, it is better to head towards the positive boundary; and aim to end the episode as soon as possible since each step incurs additional cost. However, there is an optimal step size that maximizes the distance traveled vs. the cost of the step. The optimal path involves the agent taking a ~= 1.5 for 6 consecutive steps.

First Commit


Last Touched

over 2 years ago


10 commits


Used By: