19 | October | 2020 | SQL with Manoj

Apache Spark – main Components & Architecture (Part 2)

October 19, 2020 1 comment

1. Spark Driver:

– The Driver program can run various operations in parallel on a Spark cluster.

– It is responsible to communicate with the Cluster Manager for allocation of resources for launching Spark Executors.

– And in parallel it instantiates SparkSession for the Spark Application.

– The Driver program splits the Spark Application into one or more Spark Jobs, and each Job is transformed into a DAG (Directed Acyclic Graph, aka Spark execution plan). Each DAG internally has various Stages based upon different operations to perform, and finally each Stage gets divided into multiple Tasks such that each Task maps to a single partition of data.

– Once the Cluster Manager allocates resources, the Driver program works directly with the Executors by assigning them Tasks.

2. Spark Session:

– A SparkSession provides a single entry point to interact with all Spark functionalities and the underlying core Spark APIs.

– For every Spark Application you’ve to create a SparkSession explicitly, but if you are working from an Interactive Shell the Spark Driver instantiates it implicitly for you.

– The role of SparkSession is also to send Spark Tasks to the executors to run.

3. Cluster Manager:

– Its role is to manage and allocate resources for the cluster nodes on which your Spark application is running.

– It works for Spark Driver and provides information about available Executor nodes and schedule Spark Tasks on them.

– Currently Spark supports built-in standalone cluster manager, Hadoop YARM, Apache Mesos and Kubernetes.

4. Spark Executor:

– By now you would have known what are Executors.

– These executes Tasks for an Spark Application on a Worker Node and keep communication with the Spark Driver.

– An Executor is actually a JVM running on a Worker node.

Categories: Apache Spark Tags: Apache Spark, Cluster Manager, Spark Driver, Spark Executor, Spark Session

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

SQL with Manoj

Archive

Apache Spark – main Components & Architecture (Part 2)

1. Spark Driver:

2. Spark Session:

3. Cluster Manager:

4. Spark Executor:

Follow Us

SQL Tags

Categories

Archives

Top Posts

Blog Stats, since Aug 2010

Current Visitors

StatCounter …since April 2012

Leisure blog: Creek & Trails

Disclaimer

Meta

Follow Blog via Email

Alexa Rank