Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation for Julia.

The package is registered on METADATA, so to install

```
Pkg.add("CovarianceMatrices")
```

This package provides types and methods useful to obtain consistent estimates of the long run covariance matrix of a random process.

Three classes of estimators are considered:

**HAC**- heteroskedasticity and autocorrelation consistent (Andrews, 1996; Newey and West, 1994)**HC**- hetheroskedasticity (White, 1982)**CRVE**- cluster robust (Arellano, 1986)

The typical application of these estimators is to conduct robust inference about parameters of a model. This is accomplished by extending methods defined in StatsBase.jl and GLM.jl.

Available kernel types are:

`TruncatedKernel`

`BartlettKernel`

`ParzenKernel`

`TukeyHanningKernel`

`QuadraticSpectralKernel`

For example, `ParzenKernel(NeweyWest)`

return an instance of `TruncatedKernel`

parametrized by `NeweyWest`

, the type that corresponds to the optimal bandwidth calculated following Newey and West (1994). Similarly, `ParzenKernel(Andrews)`

corresponds to the optimal bandwidth obtained in Andrews (1991). If the bandwidth is known, it can be directly passed, i.e. `TruncatedKernel(2)`

.

The examples below clarify the API, that is however relatively easy to use.

In the regression context, the function `vcov`

does all the work:

```
vcov(::DataFrameRegressionModel, ::HAC; prewhite = true)
```

Consider the following artificial data (a regression with autoregressive error component):

```
using CovarianceMatrices
using DataFrames
srand(1)
n = 500
x = randn(n,5)
u = Array{Float64}(2*n)
u[1] = rand()
for j in 2:2*n
u[j] = 0.78*u[j-1] + randn()
end
u = u[n+1:2*n]
y = 0.1 + x*[0.2, 0.3, 0.0, 0.0, 0.5] + u
df = DataFrame()
df[:y] = y
for j in enumerate([:x1, :x2, :x3, :x4, :x5])
df[j[2]] = x[:,j[1]]
end
```

Using the data in `df`

, the coefficient of the regression can be estimated using `GLM`

```
lm1 = glm(y~x1+x2+x3+x4+x5, df, Normal(), IdentityLink())
```

To get a consistent estimate of the long run variance of the estimated coefficients using a Quadratic Spectral kernel with automatic bandwidth selection *à la* Andrews

```
vcov(lm1, QuadraticSpectralKernel(Andrews), prewhite = false)
```

If one wants to estimate the long-time variance using the same kernel, but with a bandwidth selected *à la* Newey-West

```
vcov(lm1, QuadraticSpectralKernel(NeweyWest), prewhite = false)
```

The standard errors can be obtained by the `stderr`

method

```
stderr(::DataFrameRegressionModel, ::HAC; prewhite::Bool)
```

Sometime is useful to access the bandwidth selected by the automatic procedures. This can be done using the `optimalbw`

method

```
optimalbw(NeweyWest, QuadraticSpectralKernel, lm1; prewhite = false)
optimalbw(Andrews, QuadraticSpectralKernel, lm1; prewhite = false)
```

Sometime interest lies in estimating the long-run variance of the average of the process. At the moment this can be done by carrying out a regression on a constant (the sample mean of the realization of the process) and using `vcov`

or `stderr`

to obtain a consistent variance.

```
lm2 = glm(u~1, df, Normal(), IdentityLink())
vcov(lm1, ParzenKernel(NeweyWest), prewhite = false)
stderr(lm1, ParzenKernel(NeweyWest), prewhite = false)
```

As in the HAC case, `vcov`

and `stderr`

are the main functions. They know get as argument the type of robust variance being sought

```
vcov(::DataFrameRegressionModel, ::HC)
```

Where HC is an abstract type with the following concrete types:

`HC0`

`HC1`

(this is`HC0`

with the degree of freedom adjustment)`HC2`

`HC3`

`HC4`

`HC4m`

`HC5`

`html using CovarianceMatrices using DataFrames using GLM # A Gamma example, from McCullagh & Nelder (1989, pp. 300-2) # The weights are added just to test the interface and are not part # of the original data clotting = DataFrame( u = log([5,10,15,20,30,40,60,80,100]), lot1 = [118,58,42,35,27,25,21,19,18], lot2 = [69,35,26,21,18,16,13,12,12], w = 9*[1/8, 1/9, 1/25, 1/6, 1/14, 1/25, 1/15, 1/13, 0.3022039] ) wOLS = fit(GeneralizedLinearModel, lot1~u, clotting, Normal(), wts = array(clotting[:w])) vcov(wOLS, HC0 vcov(wOLS, HC1) vcov(wOLS, HC2) vcov(wOLS, HC3) vcov(wOLS, HC4) vcov(wOLS, HC4m) vcov(wOLS, HC5)`

The API of this class of variance estimators is subject to change, so please use with care. The difficulty is that `CRHC`

type needs to have access to the variable along which dimension the clustering mast take place. For the moment, the following approach works --- as long as no missing values are present in the original dataframe.

```
using RDatasets
df = dataset("plm", "Grunfeld")
lm = glm(Inv~Value+Capital, df, Normal(), IdentityLink())
vcov(lm, CRHC1(convert(Array, df[:Firm])))
stderr(lm, CRHC1(convert(Array, df[:Firm])))
```

10/29/2014

6 months ago

196 commits