dummy-link

SortingLab

Faster sorting algorithms (sort and sortperm) for Julia

Readme

SortingLab

An alternative implementation of sorting algorithms and APIs. The ultimate aim is to contribute back to Julia base or SortingAlgorithms.jl. However, there is commitment to keep this package's API stable and supported, so other developers can rely on the implementation and API here.

Faster Sort and Sortperm

The main function exported by SortingLab is fsort and fsortperm which generally implements faster algorithms than sort and sortperm for CategoricalArrays.CategoricalVector, Vector{T}, Vector{Union{T, Missing}} where T is

  • Int*, UInt*, Float*, String

Usage

using SortingLab;
using Test
N = 1_000_000;
K = 100;

svec = rand("id".*string.(1:N÷K, pad=10), N);

svec_sorted = fsort(svec);
@test issorted(svec_sorted)
@test issorted(svec) == false
Test Passed
# faster string sortperm
sorted_idx = fsortperm(svec)
issorted(svec[sorted_idx]) #true

# in place string sort
fsort!(svec);
issorted(svec) # true
true
# CategoricalArray sort
using CategoricalArrays
pools = "id".*string.(1:100,3);
byvec = CategoricalArray{String, 1}(rand(UInt32(1):UInt32(length(pools)), N), CategoricalPool(pools, false));
byvec = compress(byvec);

byvec_sorted = fsort(byvec);
@test issorted(byvec_sorted)
Test Passed

Sorting Vector{Union{T, Missing}}

For vectors that contain missing, the sort and sortperm performance is often sub-optimal in Base and is not supported in SortingAlgorithms.jl's radixsort implementation. This is solved by SortingLab.jl fsort, see Benchmarks Section

using Test
using Missings: allowmissing
x = allowmissing(rand(1:10_000, 1_000_000))
x[rand(1:length(x), 100_000)] .= missing

using SortingLab
@test isequal(fsort(x), sort(x))
Test Passed

Benchmarks

Base.sort vs SortingLab.radixsort

Base.sort vs SortingLab.radixsort

Base.sort vs SortingLab.fsort

Base.sortperm vs SortingLab.sortperm

Benchmarking code


using SortingLab;
using BenchmarkTools;
import Random: randstring

N = 1_000_000;
K = 100;

svec = rand("id".*string.(1:N÷K, pad=10), N);
sort_id_1m = @belapsed sort($svec);
radixsort_id_1m = @belapsed radixsort($svec);

sortperm_id_1m = @belapsed sortperm($svec);
fsortperm_id_1m = @belapsed fsortperm($svec);

rsvec = rand([randstring(rand(1:32)) for i = 1:N÷K], N);
sort_r_1m = @belapsed sort($rsvec);
radixsort_r_1m = @belapsed radixsort($rsvec);

sortperm_r_1m = @belapsed sortperm($rsvec);
fsortperm_r_1m = @belapsed fsortperm($rsvec);

using Plots
using StatsPlots
groupedbar(
    repeat(["IDs", "Random len 32"], inner=2),
    [sort_id_1m, radixsort_id_1m, sort_r_1m, radixsort_r_1m],
    group = repeat(["Base.sort","SortingLab.radixsort"], outer = 2),
    title = "Strings sort (1m rows): Base vs SortingLab")
savefig("benchmarks/sort_vs_radixsort.png")

groupedbar(
    repeat(["IDs", "Random len 32"], inner=2),
    [sortperm_id_1m, fsortperm_id_1m, sortperm_r_1m, fsortperm_r_1m],
    group = repeat(["Base.sortperm","SortingLab.fsortperm"], outer = 2),
    title = "Strings sortperm (1m rows): Base vs SortingLab")
savefig("benchmarks/sortperm_vs_fsortperm.png")

Build status

Build Status

Coverage Status

codecov.io

First Commit

01/15/2018

Last Touched

3 months ago

Commits

76 commits

Used By: