Build Status

DataFlow.jl is a bit like MacroTools, but instead of working with programs as expression trees, it works with them as dataflow graphs.

A data flow graph is a bit like an expression tree without variables; functions always refer to their inputs directly. Underneath it's a directed graph linking the output of one function call to the input of another. DataFlow.jl provides functions like prewalk and postwalk which allow you to do crazy graph-restructuring operations with minimal code, even on cyclic graphs. Think algorithms like common subexpression elimination implemented in one line rather than hundreds.

DataFlow.jl also provides a common syntax for representing dataflow graphs. This can be used by other packages (like Flux) to provide a common, intuitive way to work with embedded graphical DSLs. This approach could be applied to an extremely wide range of domains, like graphical modelling in statistics and machine learning, parallel and distributed computing or hardware modelling and simulation.

Basic Examples

Consider a simple function for calculating variance:

julia> var = @flow begin
         mean = sum(xs)/length(xs)
         meansqr = sumabs2(xs)/length(xs)
         meansqr - mean^2

julia> var = DataFlow.striplines(var) # remove clutter for demo
IVertex(sumabs2(xs) / length(xs) - (sum(xs) / length(xs)) ^ 2)

This looks like (and is) perfectly valid Julia code, but the @flow annotation out front makes a big difference; instead of being stored internally as an AST, the code is stored as a directed graph like this:

The variables are stripped out and we directly model how data moves between different operation. Notice that, for one thing, this makes opportunities for parallelism structurally obvious.

We can run common subexpression elimination on the graph as follows:

julia> DataFlow.cse(var)
grouse = length(xs)
sumabs2(xs) / grouse - (sum(xs) / grouse) ^ 2)

Multiple things have happened to transform our original code. mean and meansqr did not need to be assigned variables, so they weren't. Conversely, length(xs) is assigned a variable name because the result is used more than once. Another thing you can try is modifying var to contain an unused variable, and noticing that it gets stripped out. This seems like a very complex syntax operation, but cse is implemented in only a couple of lines.

Another unusual feature of DataFlow is that it supports cycles, for example:

julia> rnn = @flow begin
         hidden = σ( Wxh*x + Whh*hidden )
         y = σ( Why*hidden + Wxy*x )
grouse = (DataFlow.Line("REPL[16]", 2))(σ(Wxh * x + Whh * grouse))
(DataFlow.Line("REPL[16]", 3))(σ(Why * grouse + Wxy * x)))

This is not valid Julia, since hidden must be defined before it is used. In DataFlow.jl this is simply represented as a graph like the following:

Applications that build on DataFlow.jl can decide what meaning to give to structures like this. For example, an ANN library might unroll the network a given number of steps at a cycle, enabling recurrent neural network architectures to be easily expressed.

First Commit


Last Touched

4 days ago


286 commits