5

0

1

2

# Discreet

Discreet is a small opinionated toolbox to estimate entropy and mutual information from discrete samples. It contains methods to adjust results and correct over- or under-estimations.

## Estimating entropy

Discreet uses StatsBase's FrequencyWeights and ProbabilityWeights types.

``````using StatsBase: FrequencyWeights, ProbabilityWeights
using Discreet

dist = FrequencyWeights([1, 1, 1, 1, 1, 1])
entropy(dist)  # Naive method: log(6) ≈ 1.792

entropy(dist; method=:CS)  # Chao-Shen correction: ≈ 3.840

entropy(dist; method=:Shrink)  # Shrinkage correction: ≈ 1.792

dist = ProbabilityWeights([.5, .5])
entropy(dist)  # log(2) ≈ 0.693
``````

Discreet can also estimate the entropy of a sample:

``````using Discreet

data = ["tomato", "apple", "apple", "banana", "tomato"]
estimate_entropy(data)  # == entropy(FrequencyWeights([2, 2, 1]))
``````

## Estimate mutual information

Discrete provides similar routines to estimate mutual information.

``````using Discreet

labels_a = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
labels_b = [1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 3, 1, 3, 3, 3, 2, 2]
mutual_information(labels_a, labels_b)  # Naive method: ≈ 0.410

mutual_information(labels_a, labels_b; method=:CS)  # Chao-Shen correction: ≈ 0.148

mutual_information(labels_a, labels_b; normalize=true)  # Normalized score (between 0 and 1): ≈ 0.382
``````

05/22/2018

1 day ago

24 commits