A few wrapped HIP and hipBLAS functions. https://github.com/ROCm-Developer-Tools/HIP



Build Status

Coverage Status



Compile hip_jl.cpp with:

hipcc -O3 -shared -fPIC hip_jl.cpp -o hip_jl.so

and then it should work. Currently only sgemm! is a good idea. With Vega graphics, sgemm! on 5000x5000 matrices takes CLBLAS 45 ms on my computer. hipBLAS's sgemm:

julia> @benchmark sync_func($sgemm!, $hipC, 1f0, $hipA, $hipB, 0f0)
  memory estimate:  288 bytes
  allocs estimate:  6
  minimum time:     22.702 ms (0.00% GC)
  median time:      27.016 ms (0.00% GC)
  mean time:        26.965 ms (0.00% GC)
  maximum time:     27.429 ms (0.00% GC)
  samples:          186
  evals/sample:     1

Because hipBLAS uses cuBLAS as a backend for NVidia cards, I'm sure the difference (vs CLBLAS) will be even more dramatic there.

Precompilation isn't currently supported, because I was wrapping the C++ functions via simply using extern "C". Before adding this as a dependency, I'll switch to either the usual ccall syntax of ccall( (fuction, lib), ... ) or CxxWrap.

This is only really good if you want direct access to HIP functionality. Writing your own kernels, or even broadcast-type statements, currently means writing in C++. (Forking transpiler -- or better yet, making a sort of hipNative -- is beyond me for the forseeable future.) For that reason, CLArrays and CuArrays are almost certainly much better choices unless you're desperate for BLAS performance on an AMD card.

First Commit


Last Touched

6 months ago


5 commits

Used By: