Webinar 2016 Introduction to Apache Spark on SHARCNET

Apache Spark is a general purpose light cluster computer platform. Spark extends the MapReduce model to support more computations. Spark is accessible through Python, Scala, Java, R or SQL. Spark can run on Hadoop clusters or in a standalone mode, and access any Hadoop data, including databases from Cassandra, Hive or Hbase. It has been recently paired with MongoDB. In this talk, I will discuss Spark’s main data structure and commands. Some real examples where Spark can be used will be presented. I will also show how to load Spark module on SHARCNET clusters, how to submit a Spark script to the SHARCNET scheduler, and how to use SHARCNET resources for developing your Spark program using Python.

Webinar 2016 Introduction to Apache Spark on SHARCNET

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools