4 days ago
Julia interface to Apache Spark.
See Roadmap for current status.
Spark.jl requires at least Java 7 and Maven to be installed and available in
Pkg.clone("https://github.com/dfdx/Spark.jl") Pkg.build("Spark") # we also need latest master of JavaCall.jl Pkg.checkout("JavaCall")
This will download and build all Julia and Java dependencies. To use Spark.jl type:
All examples below are runnable from REPL
sc = SparkContext(master="local") path = "file:///var/log/syslog" txt = text_file(sc, path) count(txt) close(sc)
sc = SparkContext(master="spark://spark-standalone:7077", appname="Say 'Hello!'") path = "file:///var/log/syslog" txt = text_file(sc, path) rdd = map(txt, line -> length(split(line))) reduce(rdd, +) close(sc)
NOTE: currently named Julia functions cannot be fully serialized, so functions passed to executors should be either already defined there (e.g. in preinstalled library) or be anonymous functions.
sc = SparkContext(master="mesos://mesos-master:5050") path = "hdfs://namenode:8020/user/hdfs/test.log" txt = text_file(sc, path) rdd = map_partitions(txt, it -> filter(line -> contains(line, "a"), it)) collect(rdd) close(sc)
For the full supported API see the list of exported functions.