*Arrays with acceleration indices.*

**AcceleratedArrays** provides (secondary) acceleration indexes for Julia `AbstractArray`

s. Such
acceleration indexes can be used to speed up certain operations, particularly those
involving searching through the values - for example, an `AcceleratedArray`

may have more
efficient implementations of functions such as `findall`

, `filter`

, and `unique`

.

As a general rule, this package has been implemented for the purposes of accelerating analytics workloads and is designed to support functional, non-mutating workflows. It is currently not supported to add an index to data you expect to mutate afterwards.

To download this package, from Julia v1.0 press `]`

to enter package mode and type:

```
pkg> dev https://github.com/andyferris/AcceleratedArrays.jl
```

An `AcceleratedArray`

is generally created by using the `accelerate`

and `accelerate!`

functions.

```
# Construct a hash mapping to unique names
a = accelerate(["Alice", "Bob", "Charlie"], UniqueHashIndex)
# Rearrange an array of random numbers into ascending order
b = accelerate!(rand(1:100, 100), SortIndex)
```

The resulting arrays can be used just like regular Julia arrays, except some operations become faster. For example, the hash map will let us find a certain element without exhaustively searching the array, or we can easily find all the elements within a given interval with a sorted array.

```
# Find the index of "Bob" in `a`
findall(isequal("Bob"), a)
# Return all the numbers in `b` between 40 and 60
filter(in(40..60), b)
```

Accelerations are fully implemented for the following functions, where `a`

is an
`AcceleratedArray`

:

`x ∈ a`

`count(pred, a)`

`findall(pred, a)`

`filter(pred, a)`

There is some work-in-progress on a variety of other functions, including some from SplitApplyCombine:

`findfirst(pred, a)`

and`findlast(pred, a)`

`unique(a)`

`group`

,`groupinds`

,`groupview`

and`groupreduce`

`innerjoin`

Accelerations are only available for some predicates `pred`

, which naturally depend on the
acceleration index used (see below for a full set).

The package intruduces the `AbstractIndex`

supertype and the following concrete implemetations.
Generally, an index is created when the user calls `accelerate`

or `accelerate!`

.

`HashIndex`

This index constructs a hashmap between values in the array, and the corresponding array
indices. For example, invoking `findall`

to search for the locations of certain values
will be reduced to a simple dictionary lookup. Primarily accelerates commands using the
`isequal`

predicate.

`UniqueHashIndex`

Like `HashIndex`

, except each value in the array can only appear once. Apart from
guaranteeing uniqueness, certain operations may be faster with a `UniqueHashIndex`

than
with a `HashIndex`

.

`SortIndex`

This index determines the order of the elements (with respect to `isless`

). This index
can accelerate not only the `isequal`

predicate, but a variety of other order-based
predicates as well (see below).

The `accelerate!`

function will rearrange the input array, like `sort!`

. This can speed
up operations due to simplified algorithms and cache locality.

`UniqueSortIndex`

Like `SortIndex`

, except each value in the array can only appear once. Apart from
guaranteeing uniqueness, certain operations may be faster with a `UniqueSortIndex`

than
with a `SortIndex`

.

It is simple for a user or another package to implement an `AbstractIndex`

- for instance
a third-party package may provide a spatial acceleration index, or an index for fast
textual search. Simply overload `accelerate`

(and optionally `accelerate!`

) as well as the
operations you would like to accelerate, such as `filter`

, `findall`

, etc. Indices for
unique sets of values may inherit from `AbstractUniqueIndex <: AbstractIndex`

.

In Julia, sorting is (typically) achieved using the `isless`

and `isequal`

predicates,
which are designed to provide a canonical total order for values. Currently, the
acceleration indices rely on these rather than the comparison operators `==`

, `<`

, `<=`

,
`>`

, `>=`

and `!=`

.

To make life easier, this package introduces a number of new convenience functions:

`islessequal(a, b) = isless(a, b) || isequal(a, b)`

`isgreater(a, b) = isless(b, a)`

`isgreaterequal(a, b) = isless(b, a) || isequal(a, b)`

Any of these support "currying", which is a simple syntax for creating a closure such as
`isequal(a) = (b -> isequal(a, b))`

. Such curried predicates are picked up by multiple
dispatch to accelerate operations like `findall(isequal(3.0), accelerated_array)`

.

It is common to want to search for all values in a range. This package introduces an
`Interval`

type to represent the set of of values between two endpoints (with respect to
`isless`

and `isequal`

).

An interval is easily created with the `..`

operator via the syntax `a .. b`

. To find if
a value is in this range, use the `in`

function/operator (alternatively spelled `∈`

, which
can be inserted at the REPL via `\in <TAB>`

). For example, `3 ∈ 0 .. 10`

is `true`

but
`13 ∈ 0 .. 10`

is `false`

.

By default, an interval is inclusive of its endpoints, such that `10 ∈ 0 .. 10`

. An endpoint
can be excluded via the `lessthan`

or `greaterthan`

function, which returns a value almost equal
to but slightly less/greater than its input. An interval exclusive of both its endpoints can be
expressed as `greaterthan(a) .. lessthan(b)`

. For example `10 ∉ 0 .. lessthan(10)`

.

This package is still young, and could support some more features, such as:

- Accelerate more functions, including those in
`SplitApplyCombine`

. - Figure out how to support
`missing`

,`==`

,`<`

with either a hash- or sort-based index. - Move
`Interval`

s into their own package, potentially reconcile with*IntervalSets.jl*(which currently uses`<=`

and`>=`

for comparisons).

07/15/2018

about 1 month ago

9 commits