dummy-link

10

1

7

6

# WaveletMatrices

An implementation of "The Wavelet Matrix" (Claude and Navarro) http://www.dcc.uchile.cl/~gnavarro/ps/spire12.4.pdf.

The wavelet matrix is a data structure to represent an immutable sequence of unsigned integers that supports some queries efficiently.

The `WaveletMatrix` type is defined as follows:

``````struct WaveletMatrix{w,T<:Unsigned,B<:AbstractBitVector} <: AbstractVector{T}
``````

where

• `w`: the number of bits required to encode the unsigned integers (elements)
• `T`: the type of elements
• `B`: the type of bit vectors used to store elements.

To efficiently pack a sequence of unsigned integers, `w` should be as small as possible but enough to encode those integers. For example, if you want to store integers between 0x00 and 0x03 (i.e. four distinct integers), setting `w = 2 (= ceil(log2(4)))` is the best choice.

The basic operations available on the wavelet matrix are:

• `getindex(wm::WaveletMatrix, i::Integer)`: Return `i`-th element of `wm`.
• `rank(a::Unsigned, wm::WaveletMatrix, i::Integer)`: Count the number of `a`'s occurrences in `wm` between `1:i`.
• `select(a::Unsigned, wm::WaveletMatrix, j::Integer)`: Return the position of the `j`-th occurrence of `a` in `wm`.

## Example

``````data = rand(0x00:0x03, 100_000)
wm = WaveletMatrix{2}(data)

wm
rank(0x02, wm, 5555)
partialsort(0x01, wm, 9876)
``````

05/30/2015

about 1 year ago

69 commits