|Project Status||Documentation||Build Status|
Over the next 2 months I'm planning to update StatisticalRethinking.jl to reflect the changes in the 2nd edition of the book. At the same time (but this will likely take longer) I'll also expand coverage of chapters 5 and beyond.
Towards the end of this year I also plan to update TuringModels.jl.
This package contains Julia versions of selected code snippets and mcmc models contained in the R package "rethinking" associated with the book Statistical Rethinking by Richard McElreath.
This package is part of the broader StatisticalRethinkingJulia Github organization.
In the book and associated R package
rethinking, statistical models are defined as illustrated below:
flist <- alist( height ~ dnorm( mu , sigma ) , mu <- a + b*weight , a ~ dnorm( 156 , 100 ) , b ~ dnorm( 0 , 10 ) , sigma ~ dunif( 0 , 50 ) )
Posterior values can be approximated by
# Simulate quadratic approximation (for simpler models) m4.31 <- quad(flist, data=d2)
or generated using Stan by:
# Generate a Stan model and run a simulation m4.32 <- ulam(flist, data=d2)
The author of the book states: "If that (the statistical model) doesn't make much sense, good. ... you're holding the right textbook, since this book teaches you how to read and write these mathematical descriptions" (page 77).
StatisticalRethinkingJulia is intended to allow experimenting with this learning process using four available mcmc options in Julia:
A secondary objective of
StatisticalRethinkingJulia is to compare definition and execution of a variety of models in the above four mcmc packages.
As stated many times by the author in his online lectures, this package is not intended to take away the hands-on component of the course. The clips are just meant to get you going but learning means experimenting, in this case using Julia.
At least one other package (Klara) is available for mcmc in Julia. Time constraints prevented this option to be included in
StatisticalRethinkingJulia. For similar reasons, the number of models implemented in MambaModels is very limited.
Instead of having all snippets in a single file, the snippets are organized by chapter and grouped in clips by related snippets. E.g. chapter 0 of the R package has snippets 0.1 to 0.5. Those have been combined into 2 clips:
clip-01-03.jl- contains snippets 0.1 through 0.3
clip-04-05.jl- contains snippets 0.4 and 0.5.
These 2 files are in scripts/00 and later on processed by Literate.jl to create 3 derived versions, e.g. from
clip_01_03.jl in scripts/00:
clip-01-03.md- included in the documentation
clip-01-03.ipynb- stored in the notebooks/chapter directory
clip-01-03.jl- stored in the chapters/chapter directory
Occasionally lines in scripts are suppressed when Literate processes input source files, e.g. in Turing scripts the statement
#nb Turing.turnprogress(false); is only inserted in the generated notebook but not in the corresponding chapter .jl script. Similarly
#src ... will only be included in the .jl scripts in the chapters subdirectories.
A single snippet clip will be referred to as
Models with names such as
04/m4.5d.jl generate mcmc samples using Turing.jl, CmdStan.jl, Mamba.jl or DynamicHMC.jl respectively. In some cases the results of the mcmc chains have been stored and retrieved (or regenerated if missing) in other clips, e.g.
Richard Torkar has taken the lead in developing the Turing versions of the models in chapter 8 and subsequent chapters.
Tamas Papp has also been very helpful during the development og the DynamicHMC versions of the models.
The TuringLang team and #turing contributors on Slack have been extremely helpful! The Turing examples by Cameron Pfiffer are followed closely in several example scripts.
The documentation has been generated using Literate.jl and Documenter.jl based on several ideas demonstrated by Tamas Papp in DynamicHMCExamples.jl.
Question and contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems or have a question.
rethinking must have been an on-going process over several years,
StatisticalRethinkinh.jl will likely follow a similar path.
The initial version (v1) of
StatisticalRethinking is really just a first attempt to capture the models and show ways of setting up those models, execute the models and post-process the results using Julia.
As mentioned above, a second objective of v1 is to experiment and compare the four selected mcmc options in Julia in terms of results, performance, ease of expressing models, etc.
The R package
rethinking, in the experimental branch on Github, contains 2 functions
ulam (previously called
map2stan) which are not in v1 of
Statisticalrethinking.jl. It is my intention to study those and possibly include something similar to
ulam (or both) in a future of
clip-02-05.jl an inital example of using the
maximum_a_posteriori estimate and associated quadratic (Normal) approximation is illustrated.
Several other interesting approaches that could become a good basis for such an endeavour are being explored in Julia, e.g. Soss.jl and Omega.jl.
Many other R functions such as precis(), link(), shade(), etc. are not in v1, although some very early versions are being tested. Expect significant refactoring of those in future versions and at the same time better integration with MCMCChains.Chains objects.
The Mamba examples should really use
@everywhere using Mamba in stead of
using Mamba. This was done to get around a limitation in Literate.jl to test the notebooks when running in distributed mode.
src directory of all packages is a file scriptentry.jl which defines an object
script_dict which is used to control the generation of documentation, notebooks and .jl scripts in chapters and testing of the notebooks. See
?ScriptEntry or enter e.g.
script_dict["02"] in the REPL. In the model packages this file is suffixed by an indication of the used mcmc option. e.g.
script_dict_d in DynamicHMCModels.
A utility function, generate() is part of each package to regenerate notebooks and chapter scripts, please see ?generate. Again, e.g.
generate_t in TuringModels generates all model notebooks and chapter scripts for Turing models.
In a similar fashion, borrowed from DynamicHMCExamples I define several variations on
rel_path(). By itself,
rel_path() points at the scr directory of StatisticalRethinking.jl and e.g.
rel_path_s() points to the src directory of StanModels. The
rel_path() version is typically used to read in data files. All others are used to locate directorres to read from or store generated files into.
about 11 hours ago