Webinar 2016 Introduction to Apache Spark on SHARCNET

From SHARCNETHelp
Revision as of 10:46, 14 June 2016 by imported>Syam (Created page with "Apache Spark is a general purpose light cluster computer platform. Spark extends the MapReduce model to support more computations. Spark is accessible through Python, Scala, J...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Apache Spark is a general purpose light cluster computer platform. Spark extends the MapReduce model to support more computations. Spark is accessible through Python, Scala, Java, R or SQL. Spark can run on Hadoop clusters or in a standalone mode, and access any Hadoop data, including databases from Cassandra, Hive or Hbase. It has been recently paired with MongoDB. In this talk, I will discuss Spark’s main data structure and commands. Some real examples where Spark can be used will be presented. I will also show how to load Spark module on SHARCNET clusters, how to submit a Spark script to the SHARCNET scheduler, and how to use SHARCNET resources for developing your Spark program using Python.