**RobustLeastSquares** attempts to make robust estimation using least-squares
problems painless and simple. It provide these useful features:

- A set of linear equation solvers, to perform linear least-squares optimization steps.
- A set of
*M*-estimators for robust estimation. An estimator is used to remove outliers by reducing the cost (loss) function for abnormally large residuals, and gets its name from the statistical method of estimating parameters from certain statistical distributions (e.g. Gaussian statistics correspond to the standard*L2*-estimator, etc). - A high-level optimization routine that performs iteratively reweighted least-squares, solving the non-linear optimization of the cost/loss function by a series of quadratic minimization steps.

Suppose we want to find the "best estimate" of `x`

to the equations `Ax = b`

,
where we know our data (`A`

and `b`

) suffer from incomplete information,
statistical errors and wild outliers. *RobustLeastSquares* lets us solve this
robustly and easily:

```
x_estimated = reweighted_lsqr(A, b, HuberEstimator(1.0); refit = true, n_iter = 20)
```

The package has three in-built solvers - *QR* factorization of the ordinary
least-squares problem, the normal-form of the least-squares problem, and conjugate
gradient on the former.

We are trying to find *x* such that *||r||* is minimized where the residual *r = Ax - b* (i.e. find the *x* so
that *Ax* is as close as possible to *b* - if *A* is invertible, the solution is
*inv(A).b* and *r = 0*).

The solution in the general case may be found directly
using the "normal equations" *A'A x = A'b* where *A'A* is now a square,
Hermitian matrix, and even if *A'A* is singular, there always exists an exact
solution to this equation (since *A'b* is guaranteed to have zero component in
the kernel of *A'A*). This is more poorly numerically conditioned than
minimizing *r*, but can be faster depending on the sparsity and shape of *A*.

A *weighted* least-squares problem introduces a weight vector *w* which is
multiplies *r* element-wise to scale some residual elements to be larger or smaller,
increasing or decreasing their importance in the optimization procedure (which
is what lets us reduce the impact of outliers on the solution). Writing the
diagonal matrix *W = diag(w)*, we are trying to minimize *||Wr|| = ||W(Ax-b)||*, or in the
normal form, *A'WA = A'Wb*.

The solver is called using the `solve()`

function

```
x = solve(A, b, weights, method, x0)
# A: M x N matrix (dense, sparse, etc)
# b: M-vector
# weights: scaling factors to be applied to each element of *r*
# method: a symbol, one of :qr, :normal or :cg
# x0: initial starting vector, for use with :cg method.
```

But how does one obtain the optimal weight vector for reducing the impact of
outliers? An M-estimator (typically) dampens the impact of the largest elements
of *r*, making them impact the solution less than in the typical *L2*-estimator
where the cost function is `sum(r.^2)`

. See this excellent resource for an
overview.

Included *M*-estimators are represented by Julia instances of subtypes of `MEstimator`

`L2Estimator()`

,*cost*=`sum(r.^2)`

`L1Estimator()`

,*cost*=`sum(abs(r))`

`L1L2Estimator()`

, behaves*L2*for*r << 1*, and*L1*for*r >> 1*`FairEstimator(c)`

, behaves*L2*for*r << c*, and*L1*for*r >> c*`HuberEstimator(c)`

, behaves*L2*for*r << c*, and*L1*for*r >> c*`CauchyEstimator(c)`

, behaves*L2*for*r << c*, and logarithmic for*r >> c*(non-convex)`GemanEstimator()`

, behaves*L2*for*r << 1*, and Lorentzian for*r >> 1*(non-convex)`WelschEstimator(c)`

, behaves*L2*for*r << c*, and Gaussian for*r >> c*(non-convex)`TukeyEstimator(c)`

, behaves*L2*for*r < c*, and complete cutoff for*r > c*(non-convex)

As you descend the list, the outliers are more strongly cut-out of the optimization. The non-convex estimators may not have a unique minimum and care must be taken to reach a good-quality solution.

An *M*-estimator `est::MEstimator`

provides an interface for obtaining the cost function and weight
vectors through the methods `estimator_rho(r,est)`

, `estimator_psi(r,est)`

, `estimator_weight(r,est)`

and
`estimator_sqrtweight(r,est)`

. These are called automatically from the optimizer
below or may be integrated into your own code.

There is one more special estimator called a `MultiEstimator`

which allows you
to apply different estimators to different parts of your residual vector. This
may be useful to, e.g., apply the Huber-estimator to residuals based on your
measurements and the *L2*-estimator to the residuals corresponding to regularization.

Finally, one can optimize an entire problem with the `reweighted_lsqr()`

function.
Briefly, this takes the following syntax:

```
(sol, res) = reweighted_lsqr(A::AbstractMatrix, b::AbstractVector, estimator::MEstimator = L2Estimator(), x0 = nothing; method::Symbol=:qr, n_iter::Integer=10, refit::Bool = false, quiet::Bool = false, kwargs...)
# A: M x N matrix
# b: M-Vector
# estimator: An instance of an MEstimator
# x0: initial guess at solution
# method: symbols :qr, :normal or :cg
# n_iter: number of times to repeat the reweighting
# refit: should the constant c for the MEstimator be refitted using the median-absolute-deviation method?
#
# sol: the final solution, a N-Vector
# res: the final residuals, an M-Vector
```

02/18/2016

11 months ago

18 commits