To import a database stored as an `xyz`

file, use

```
data = IPFitting.Data.read_xyz(fname)
```

See `?read_xyz`

for further options. This will return a `Vector{Dat}`

where
each `Dat`

is a container storing the atomistic configurion (`JuLIP.Atoms`

),
the `configtype`

as well as the "DFT observations".

A basis is defined by

- choice of bond-length or bond-angle PIPs.
- space transform
- choice of cut-off
- body-order
- polynomial degree

For example, using bond-angle PIPs with Morse coordinates and a cosine cut-off to model Si, we first define a descriptor

```
r0 = rnn(:Si)
rcut = 2.5 * r0
desc = BondAngleDesc("exp(- (r/$r0 - 1.0))", CosCut(rcut-1, rcut))
```

We can then generate basis functions using `nbpolys`

, e.g.,

```
# body-order, descriptor, degree
B4 = nbpolys(4, desc, 8)
```

In practise, one would normally specify different cut-offs and space transforms
for different body-orders. Suppose these give descriptors `D2, D3, D4`

, then
a 4-body basis can be constructed via

```
B = [ nbpolys(2, D2, 14); nbpolys(3, D3, 11); nbpolys(4, D4, 8) ]
```

For more details and more complex basis sets, see below.

Once the dft dataset and the basis functions have been specified, the least-squares system matrix can be assembled. This can be very time-consuming for high body-orders, large basis sets and large data sets. Therefore this matrix is stored in a block format that allows us to later re-use it in a variety of different ways. This is done via

```
db = LsqDB(fname, configs, basis)
```

- The db is stored in two files:
`fname_info.jld2`

and`fname_kron.h5`

. In particular,`fname`

is the path + name of the db, but without ending. E.g,`"~/scratch/nbodyips/W5Bdeg12env"`

. `configs`

is a`Vector{Dat}`

`basis`

is a`Vector{<: AbstractCalculator}`

- The command
`db = LsqDB(fname, configs, basis)`

evaluates the basis functions, e.g.,`energy(basis[i], configs[j].at)`

for all`i, j`

, and stores these values which make up the lsq matrix.

To reload a pre-computed lsq system, use `LsqDB(fname)`

. To compute a lsq
system without storing it on disk, use `LsqDB("", configs, basis)`

, i.e.,
pass an empty string as the filename.

The main function to call is
`lsqfit(db; kwargs...) -> IP, fitinfo`

The system is solved via (variants of) the QR factorisation. See `?lsqfit`

for details.

The output `IP`

of `lsqfit`

is a `JuLIP.AbstractCalculator`

which supports
`energy, forces, virial, site_energies`

. (todo: write more here, in
particular mention `fast`

)

there are two functions `filter_basis`

and `filter_configs`

that can be
used to choose a subset of the data and a subset of the basis. For example,
to take only 2B:

```
Ib2 = filter_basis(db, b -> (bodyorder(b) < 2))
```

See inline documentation for more details.

Suppose `configs::Vector{Dat}`

is a list of configurations, then we can
add fitting error information by calling

```
add_fits!(myIP, configs, fitkey="myIP")
```

This will evaluate all observations stored in configs with the new IP and store
them in `configs[n].info[fitkey]["okey"]`

. These observation values can then
be used to compute RMSE, produce scatter plots, etc.

This calculation can take a while. If `myIP`

has just been fitted using `lsqfit`

then there is a quicker way to generate the fitting errors, but this is not
yet implemented. TODO: implement this!

07/24/2018

about 1 month ago

354 commits