Difference between revisions of "Webinar 2016 Introduction to Apache Spark on SHARCNET"

From SHARCNETHelp
Jump to navigationJump to search
imported>Syam
(Created page with "Apache Spark is a general purpose light cluster computer platform. Spark extends the MapReduce model to support more computations. Spark is accessible through Python, Scala, J...")
 
(No difference)

Latest revision as of 10:46, 14 June 2016

Apache Spark is a general purpose light cluster computer platform. Spark extends the MapReduce model to support more computations. Spark is accessible through Python, Scala, Java, R or SQL. Spark can run on Hadoop clusters or in a standalone mode, and access any Hadoop data, including databases from Cassandra, Hive or Hbase. It has been recently paired with MongoDB. In this talk, I will discuss Spark’s main data structure and commands. Some real examples where Spark can be used will be presented. I will also show how to load Spark module on SHARCNET clusters, how to submit a Spark script to the SHARCNET scheduler, and how to use SHARCNET resources for developing your Spark program using Python.