This package contains many statistical recipes for concepts and types introduced in the JuliaStats organization, intended to be used with Plots.jl:
#Pkg.clone("firstname.lastname@example.org:JuliaPlots/StatPlots.jl.git") using StatPlots gr(size=(400,300))
DataFrames support allows passing
DataFrame columns as symbols. Operations on DataFrame column can be specified using quoted expressions, e.g.
using DataFrames df = DataFrame(a = 1:10, b = 10*rand(10), c = 10 * rand(10)) plot(df, :a, [:b :c]) scatter(df, :a, :b, markersize = :(4 * log(:c + 0.1)))
If you find an operation not supported by DataFrames, please open an issue. An alternative approach to the
StatPlots syntax is to use the DataFramesMeta macro
@with. Symbols not referring to DataFrame columns must be escaped by
using DataFramesMeta @with(df, plot(:a, [:b :c], colour = ^([:red :blue])))
using RDatasets iris = dataset("datasets","iris") marginalhist(iris, :PetalLength, :PetalWidth)
M = randn(1000,4) M[:,2] += 0.8sqrt.(abs.(M[:,1])) - 0.5M[:,3] + 5 M[:,3] -= 0.7M[:,1].^2 + 2 corrplot(M, label = ["x$i" for i=1:4])
import RDatasets singers = RDatasets.dataset("lattice","singer") violin(singers,:VoicePart,:Height,marker=(0.2,:blue,stroke(0))) boxplot!(singers,:VoicePart,:Height,marker=(0.3,:orange,stroke(2)))
Asymmetric violin plots can be created using the
side keyword (
:both - default,
singers_moscow = deepcopy(singers) singers_moscow[:Height] = singers_moscow[:Height]+5 myPlot = violin(singers,:VoicePart,:Height, side=:right, marker=(0.2,:blue,stroke(0)), label="Scala") violin!(singers_moscow,:VoicePart,:Height, side=:left, marker=(0.2,:red,stroke(0)), label="Moscow")
using Distributions plot(Normal(3,5), fill=(0, .5,:orange))
dist = Gamma(2) scatter(dist, leg=false) bar!(dist, func=cdf, alpha=0.3)
groupedbar(rand(10,3), bar_position = :stack, bar_width=0.7)
This is the default:
groupedbar(rand(10,3), bar_position = :dodge, bar_width=0.7)
There is a groupapply function that splits the data across a keyword argument "group", then applies "summarize" to get average and variability of a given analysis (density, cumulative, hazard rate and local regression are supported so far, but one can also add their own function). To get average and variability there are 3 ways:
compute_error = (:across, col_name), where the data is split according to column
col_name before being summarized.
compute_error = :across splits across all observations. Default summary is
(mean, sem) but it can be changed with keyword
summarize to any pair of functions.
compute_error = (:bootstrap, n_samples), where
n_samples fake datasets distributed like the real dataset are generated and then summarized (nonparametric
compute_error = :bootstrap defaults to
compute_error = (:bootstrap, 1000). Default summary is
(mean, std). This method will work with any analysis but is computationally very expensive.
compute_error = :none, where no error is computed or displayed and the analysis is carried out normally.
The local regression uses Loess.jl and the density plot uses KernelDensity.jl. In case of categorical x variable, these function are computed by splitting the data across the x variable and then computing the density/average per bin. The choice of continuous or discrete axis can be forced via
axis_type = :continuous or
axis_type = :discrete.
axis_type = :binned will bin the x axis in equally spaced bins (number given by the
nbins keyword, defaulting to
30), and continue the analysis with the binned data, treating it as discrete.
using DataFrames import RDatasets using StatPlots gr() school = RDatasets.dataset("mlmRev","Hsb82"); grp_error = groupapply(:cumulative, school, :MAch; compute_error = (:across,:School), group = :Sx) plot(grp_error, line = :path, legend = :topleft)
Keywords for loess or kerneldensity can be given to groupapply:
grp_error = groupapply(:density, school, :CSES; bandwidth = 0.2, compute_error = (:bootstrap,500), group = :Minrty) plot(grp_error, line = :path)
The bar plot
pool!(school, :Sx) grp_error = groupapply(school, :Sx, :MAch; compute_error = :across, group = :Minrty) plot(grp_error, line = :bar)
Density bar plot of binned data versus continuous estimation:
grp_error = groupapply(:density, school, :MAch; axis_type = :binned, nbins = 40, group = :Minrty) plot(grp_error, line = :bar, color = ["orange" "turquoise"], legend = :topleft) grp_error = groupapply(:density, school, :MAch; axis_type = :continuous, group = :Minrty) plot!(grp_error, line = :path, color = ["orange" "turquoise"], label = "")
about 1 month ago