Experiments with a new sorting & ordering API for Julia.

Exports functor-like objects `By`

, `Rev`

and `Op`

and the predefined ordering

```
const Forward = Op(isless)
const Backward = Rev(Forward)
```

These objects compose well:

```
method_in_some_package!(xs::AbstractVector, ord) = my_sort!(xs, Rev(By(abs, ord)))
xs = randn(7)
perm = collect(1:7)
ord = By(i -> xs[i])
method_in_some_package!(perm, ord)
@show xs[perm]
# xs[perm] = [-1.20628, 1.03054, -0.929885, -0.620184, 0.391168, -0.29274, 0.172728]
```

Currently we have an advanced sorting algorithm to sort `Float64`

and `Float32`

by the order
induced by `isless`

, but the downside is that we *almost never* dispatch on it in practice
because of the restrictive signature:

```
sort!(v::AbstractVector{<:Union{Float32,Float64}}, ...)
```

For instance `sort([1.0, 2.0, -3.0], by = abs)`

will not match the signature.

**The proposed fix** in this package is to canonicalize the order instances such that we can
dispatch on (a) the inferred type that goes into the comparison function, (b) the comparison
function itself and (c) the direction (forward or backward).

Internally it works by summarizing composed ordering types in a linearized fashion via a
function called `flatten`

. This function takes an ordering and element type of the vector
and returns a `TrivialOrder{T,F,R<:Bool,B} <: Ord`

instance (and sometimes the original
order if it thinks the order is nontrivial). Here `T`

is the type that goes into the
comparison function, `F`

is the type of the comparison function itself, `R`

is true when
sorting in reverse and `B`

is a tuple of types of all the gathered `by`

functions /
transformations. An explicit example:

```
> xs = [6, 5, 2, 7]
> ord = By(inv, Rev(By(abs, Backward))) # 2 by functions
> effective_ord = SortingSortingOut.flatten(ord, eltype(xs))
TrivialOrder{Float64,typeof(isless),false,Tuple{typeof(inv),typeof(abs)}}(isless, (inv, abs))
# Effectively this order comes down to sorting `Float64` with `isless`, so sorting will
# automatically dispatch on the efficient algorithm for this order.
> my_sort!(xs, effective_ord)
```

The above is completely equivalent to:

```
> my_sort!(xs, ord)
```

n | `sort!` |
`my_sort!` |
speedup |
---|---|---|---|

10_000 | 1.113 ms | 560.6 μs | 2.0x |

1_000 | 66.00 μs | 11.66 μs | 5.7x |

100 | 2.021 μs | 615.6 ns | 3.3x |

```
using SortingSortingOut, BenchmarkTools
struct Product
price::Int
weight::Float64
end
weight(p::Product) = p.weight
function sort_products(n = 100)
products = [Product(rand(1:100), 100rand()) for i = 1 : n]
fst = @benchmark sort!(ps, by = $weight) setup = (ps = copy($products))
snd = @benchmark my_sort!(ps, $(By(weight))) setup = (ps = copy($products))
fst, snd
end
```

n | `sort!` |
`my_sort!` |
speedup |
---|---|---|---|

10_000 | 1.128 ms | 722.3 μs | 1.6x |

1_000 | 49.62 μs | 28.16 μs | 1.8x |

100 | 2.296 μs | 787.5 ns | 2.9x |

```
using SortingSortingOut, BenchmarkTools
function sort_by_magnitude(n = 1_000)
xs = rand(ComplexF64, n)
fst = @benchmark sort!(ys, by = $abs2) setup = (ys = copy($xs))
snd = @benchmark my_sort!(ys, $(By(abs2))) setup = (ys = copy($xs))
fst, snd
end
```

Via the exact same logic as the fast floating point sort code, we could potentially also
efficiently sort `Vector{Union{T,Nothing}}`

by partitioning the vector in `[T..., Nothing...]`

first, and subsequently calling a fast sort method on the first bit with just `T`

's.

And this package would then allow to also sort `sort!(v, by = maybe_something)`

where the
function `maybe_something`

returns for instance `Union{T,Missing}`

values.

However, AFAIK this cannot yet work because we cannot convince the compiler (without
overhead) that a value of type `Union{T,Nothing}`

is actually a `T`

-- even when we're
100% sure it is.

`maximum`

and `minimum`

accept `Ord`

instancesTwo changes to these methods:

- They return
`nothing`

when an empty collection is passed - They accept
`Ord`

instances and satisfy`my_maximum(xs, ord) == my_sort(Vector(xs), ord)[end]`

and`my_minimum(xs, ord) === my_sort(Vector(xs), ord)[1]`

. ``` using SortingSortingOut

my_maximum([1, -2, 3, -5], By(abs)) # -5 my_maximum((i for i = 1 : 10), Forward) # 10 my_maximum([]) # nothing

```
### Make search convenient -- search by (transformed) value, not by specific vector element
Currently there is an [unresolved issue](https://github.com/JuliaLang/julia/issues/9429)
in Julia Base where one has to construct a "fake" vector element in order to search. For
instance:
```

julia> struct Product name::String price::Int end;

julia> products = [Product("Apple", 75), Product("Book", 1200), Product("Car", 50000)];

julia> searchsortedfirst(products, 100, by = p -> p.price) ERROR: type Int64 has no field price

julia> searchsortedfirst(products, Product("Fake product", 100), by = p -> p.price) 2

```
A solution to this problem is to implement a new comparison function:
```

julia> is_price_less(product::Product, price::Int) = isless(product.price, price); julia> is_price_less(price::Int, product::Product) = isless(price, product.price); julia> searchsortedfirst(products, 100, lt = is_price_less) 2

```
but this seems more verbose than necessary and will not work whenever the type of the
transformed value is equal to the original element type of the vector.
For instance, suppose we have a vector of integers sorted by absolute magnitude, and we
naively search for the index of the first value greater than or equal to `-2`. This should
obvioulsy be the first index. The following does *not* work as (maybe?) expected
```

julia> searchsortedfirst([1, 2, 2, 3, 3, 4], -2, by = abs) 2

```
since `-2` gets transformd to `2`. To solve this we have to provide a different comparison
operator and build a `Wrapper` struct to make multiple dispatch work:
```

julia> struct Wrapper value end; julia> abs_lt(a, b::Wrapper) = isless(abs(a), b.value) julia> abs_lt(a::Wrapper, b) = isless(a.value, abs(b)) julia> searchsortedfirst([1, 2, 2, 3, 3, 4], Wrapper(-2), lt = abs_lt) 1

```
But this is very verbose and hard to untangle. Also it does not seem to compose well.
To address this, this package provides a wrapper type called `Value` which allows you to
write simple one-liners:
```

julia> lowerbound(products, 100, By(p -> p.price)) ERROR: type Int64 has no field price

julia> lowerbound(products, Value(100), By(p -> p.price)) 2

julia> lowerbound([1, 2, -2, 3, 3, 4], -2, By(abs)) # transformation applies to -2 2

julia> lowerbound([1, 2, -2, 3, 3, 4], Value(-2), By(abs)) # no transformation of -2 1

```
Also note that the functions have been renamed a bit:
- `searchsortedfirst` -> `lowerbound`
- `searchsortedlast` -> `upperbound`
- `searchsorted` -> `equalrange`
```

09/14/2018

over 1 year ago

15 commits