New members: get your first 7 days of ITTutorPro Premium for free! Join for free

Apache Spark and Scala Certification Training

Course Description

Apache Spark and Scala Certification Training

Advance your mastery of the Big Data Hadoop Ecosystem with Simplilearn’s Apache Scala and Spark certification training course. This course will help you will attain crucial, in-demand Apache Spark skills and develop a competitive advantage for an exciting career as a Spark Developer.

Spark & Scala Certification Course Overview

This Spark certification training helps you master the essential skills of the Apache Spark open-source framework and Scala programming language, including Spark Streaming, Spark SQL, machine learning programming, GraphX programming, and Shell Scripting Spark. You will also understand the role of Spark in overcoming the limitations of MapReduce.


This Spark certification training is ideal for professionals aspiring for a career in the field of real-time big data analytics, analytics professionals, research professionals, IT developers and testers, data scientists, BI and reporting professionals, and students who want to gain a thorough understanding of Apache Spark.


Those wishing to take the Apache Spark certification training should have a fundamental knowledge of any programming language and a basic understanding of any database, SQL, and query language for databases. Working knowledge of Linux- or Unix-based systems is also beneficial.

Share on:

Course Syllabus

Lesson 00 – Course Overview

0.001 Introduction
0.002 Course Objectives
0.003 Course Overview
0.004 Target Audience
0.005 Course Prerequisites
0.006 Value to the Professionals
0.007 Value to the Professionals (contd.)
0.008 Value to the Professionals (contd.)
0.009 Lessons Covered
0.010 Conclusion

Lesson 01 – Introduction to Spark

1.001 Introduction
1.002 Objectives
1.3 Evolution of Distributed Systems
1.004 Need of New Generation Distributed Systems
1.005 Limitations of MapReduce in Hadoop
1.006 Limitations of MapReduce in Hadoop (contd.)
1.007 Batch vs. Real-Time Processing
3.040 PairRDD Methods-Others
1.009 Application of In-Memory Processing
1.010 Introduction to Apache Spark
1.11 Components of a Spark Project
1.012 History of Spark
1.013 Language Flexibility in Spark
1.014 Spark Execution Architecture
1.015 Automatic Parallelization of Complex Flows
1.016 Automatic Parallelization of Complex Flows-Important Points
1.017 APIs That Match User Goals
1.018 Apache Spark-A Unified Platform of Big Data Apps
1.019 More Benefits of Apache Spark
1.020 Running Spark in Different Modes
1.21 Installing Spark as a Standalone Cluster-Configurations
1.022 Installing Spark as a Standalone Cluster-Configurations
1.023 Demo-Install Apache Spark
1.024 Demo-Install Apache Spark
1.025 Overview of Spark on a Cluster
1.026 Tasks of Spark on a Cluster
1.027 Companies Using Spark-Use Cases
1.028 Hadoop Ecosystem vs. Apache Spark
1.029 Hadoop Ecosystem vs. Apache Spark (contd.)
1.30 Quiz
1.031 Summary
1.032 Summary (contd.)
1.033 Conclusion

Lesson 02 – Introduction to Programming in Scala

2.001 Introduction
2.002 Objectives
2.003 Introduction to Scala
2.4 Features of Scala
2.005 Basic Data Types
2.006 Basic Literals
2.007 Basic Literals (contd.)
2.008 Basic Literals (contd.)
2.009 Introduction to Operators
2.10 Types of Operators
2.011 Use Basic Literals and the Arithmetic Operator
2.012 Demo Use Basic Literals and the Arithmetic Operator
2.013 Use the Logical Operator
2.014 Demo Use the Logical Operator
2.015 Introduction to Type Inference
2.016 Type Inference for Recursive Methods
2.017 Type Inference for Polymorphic Methods and Generic Classes
2.018 Unreliability on Type Inference Mechanism
2.019 Mutable Collection vs. Immutable Collection
2.020 Functions
2.021 Anonymous Functions
2.022 Objects
2.023 Classes
2.024 Use Type Inference, Functions, Anonymous Function, and Class
2.025 Demo Use Type Inference, Functions, Anonymous Function and Class
2.026 Traits as Interfaces
2.027 Traits-Example
2.028 Collections
2.029 Types of Collections
2.030 Types of Collections (contd.)
2.031 Lists
2.032 Perform Operations on Lists
2.033 Demo Use Data Structures
2.034 Maps
2.35 Maps-Operations
2.036 Pattern Matching
2.037 Implicits
2.038 Implicits (contd.)
2.039 Streams
2.040 Use Data Structures
2.041 Demo Perform Operations on Lists
2.42 Quiz
2.043 Summary
2.044 Summary (contd.)
2.045 Conclusion

Lesson 03 – Using RDD for Creating Applications in Spark

3.001 Introduction
3.002 Objectives
3.003 RDDs API
3.4 Features of RDDs
3.005 Creating RDDs
3.006 Creating RDDs-Referencing an External Dataset
3.007 Referencing an External Dataset-Text Files
3.008 Referencing an External Dataset-Text Files (contd.)
3.009 Referencing an External Dataset-Sequence Files
3.010 Referencing an External Dataset-Other Hadoop Input Formats
3.011 Creating RDDs-Important Points
3.012 RDD Operations
3.013 RDD Operations-Transformations
3.014 Features of RDD Persistence
3.015 Storage Levels Of RDD Persistence
3.16 Choosing The Correct RDD Persistence Storage Level
3.017 Invoking the Spark Shell
3.018 Importing Spark Classes
3.019 Creating the SparkContext
3.020 Loading a File in Shell
3.021 Performing Some Basic Operations on Files in Spark Shell RDDs
3.022 Packaging a Spark Project with SBT
3.023 Running a Spark Project With SBT
3.024 Demo-Build a Scala Project
3.025 Build a Scala Project
3.026 Demo-Build a Spark Java Project
3.027 Build a Spark Java Project
3.028 Shared Variables-Broadcast
3.029 Shared Variables-Accumulators
3.030 Writing a Scala Application
3.031 Demo-Run a Scala Application
3.032 Run a Scala Application
3.033 Demo-Write a Scala Application Reading the Hadoop Data
3.034 Write a Scala Application Reading the Hadoop Data
3.035 Demo-Run a Scala Application Reading the Hadoop Data
3.036 Run a Scala Application Reading the Hadoop Data
3.37 Scala RDD Extensions
3.038 DoubleRDD Methods
3.039 PairRDD Methods-Join
3.040 PairRDD Methods-Others
3.041 Java PairRDD Methods
3.042 Java PairRDD Methods (contd.)
3.043 General RDD Methods
3.044 General RDD Methods (contd.)
3.045 Java RDD Methods
3.046 Java RDD Methods (contd.)
3.047 Common Java RDD Methods
3.048 Spark Java Function Classes
3.049 Method for Combining JavaPairRDD Functions
3.050 Transformations in RDD
3.051 Other Methods
3.052 Actions in RDD
3.053 Key-Value Pair RDD in Scala
3.054 Key-Value Pair RDD in Java
3.055 Using MapReduce and Pair RDD Operations
3.056 Reading Text File from HDFS
3.057 Reading Sequence File from HDFS
3.058 Writing Text Data to HDFS
3.059 Writing Sequence File to HDFS
3.060 Using GroupBy
3.061 Using GroupBy (contd.)
3.062 Demo-Run a Scala Application Performing GroupBy Operation
3.063 Run a Scala Application Performing GroupBy Operation
3.064 Demo-Run a Scala Application Using the Scala Shell
3.065 Run a Scala Application Using the Scala Shell
3.066 Demo-Write and Run a Java Application
3.067 Write and Run a Java Application
3.68 Quiz
3.069 Summary
3.070 Summary (contd.)
3.071 Conclusion

Lesson 04 – Running SQL Queries Using Spark SQL

4.001 Introduction
4.002 Objectives
4.003 Importance of Spark SQL
4.004 Benefits of Spark SQL
4.005 DataFrames
4.006 SQLContext
4.007 SQLContext (contd.)
4.008 Creating a DataFrame
4.009 Using DataFrame Operations
4.010 Using DataFrame Operations (contd.)
4.011 Demo-Run SparkSQL with a Dataframe
4.012 Run SparkSQL with a Dataframe
4.13 Interoperating with RDDs
4.014 Using the Reflection-Based Approach
4.015 Using the Reflection-Based Approach (contd.)
4.016 Using the Programmatic Approach
4.017 Using the Programmatic Approach (contd.)
4.018 Demo-Run Spark SQL Programmatically
4.019 Run Spark SQL Programmatically
4.20 Data Sources
4.021 Save Modes
4.022 Saving to Persistent Tables
4.023 Parquet Files
4.024 Partition Discovery
4.025 Schema Merging
4.026 JSON Data
4.027 Hive Table
4.028 DML Operation-Hive Queries
4.029 Demo-Run Hive Queries Using Spark SQL
4.030 Run Hive Queries Using Spark SQL
4.031 JDBC to Other Databases
4.032 Supported Hive Features
4.033 Supported Hive Features (contd.)
4.034 Supported Hive Data Types
4.035 Case Classes
4.036 Case Classes (contd.)
4.37 Quiz
4.038 Summary
4.039 Summary (contd.)
4.040 Conclusion

Lesson 05 – Spark Streaming

5.001 Introduction
5.002 Objectives
5.003 Introduction to Spark Streaming
5.004 Working of Spark Streaming
5.5 Features of Spark Streaming
5.006 Streaming Word Count
5.007 Micro Batch
5.008 DStreams
5.009 DStreams (contd.)
5.010 Input DStreams and Receivers
5.011 Input DStreams and Receivers (contd.)
5.012 Basic Sources
5.013 Advanced Sources
5.14 Advanced Sources-Twitter
5.015 Transformations on DStreams
5.016 Transformations on Dstreams (contd.)
5.017 Output Operations on DStreams
5.018 Design Patterns for Using ForeachRDD
5.019 DataFrame and SQL Operations
5.020 DataFrame and SQL Operations (contd.)
5.021 Checkpointing
5.022 Enabling Checkpointing
5.023 Socket Stream
5.024 File Stream
5.025 Stateful Operations
5.026 Window Operations
5.027 Types of Window Operations
5.028 Types of Window Operations Types (contd.)
5.029 Join Operations-Stream-Dataset Joins
5.030 Join Operations-Stream-Stream Joins
5.031 Monitoring Spark Streaming Application
5.032 Performance Tuning-High Level
5.33 Performance Tuning-Detail Level
5.034 Demo-Capture and Process the Netcat Data
5.035 Capture and Process the Netcat Data
5.036 Demo-Capture and Process the Flume Data
5.037 Capture and Process the Flume Data
5.038 Demo-Capture the Twitter Data
5.039 Capture the Twitter Data
5.40 Quiz
5.041 Summary
5.042 Summary (contd.)
5.043 Conclusion

Lesson 06 – Spark ML Programming

6.001 Introduction
6.002 Objectives
6.003 Introduction to Machine Learning
6.4 Common Terminologies in Machine Learning
6.005 Applications of Machine Learning
6.006 Machine Learning in Spark
6.7 Spark ML API
6.008 DataFrames
6.009 Transformers and Estimators
6.010 Pipeline
6.011 Working of a Pipeline
6.012 Working of a Pipeline (contd.)
6.013 DAG Pipelines
6.014 Runtime Checking
6.015 Parameter Passing
6.016 General Machine Learning Pipeline-Example
6.17 General Machine Learning Pipeline-Example (contd.)
6.018 Model Selection via Cross-Validation
6.019 Supported Types, Algorithms, and Utilities
6.020 Data Types
6.021 Feature Extraction and Basic Statistics
6.022 Clustering
6.023 K-Means
6.024 K-Means (contd.)
6.025 Demo-Perform Clustering Using K-Means
6.026 Perform Clustering Using K-Means
6.027 Gaussian Mixture
6.028 Power Iteration Clustering (PIC)
6.029 Latent Dirichlet Allocation (LDA)
6.030 Latent Dirichlet Allocation (LDA) (contd.)
6.031 Collaborative Filtering
6.032 Classification
6.033 Classification (contd.)
6.034 Regression
6.035 Example of Regression
6.036 Demo-Perform Classification Using Linear Regression
6.037 Perform Classification Using Linear Regression
6.038 Demo-Run Linear Regression
6.039 Run Linear Regression
6.040 Demo-Perform Recommendation Using Collaborative Filtering
6.041 Perform Recommendation Using Collaborative Filtering
6.042 Demo-Run Recommendation System
6.043 Run Recommendation System
6.44 Quiz
6.045 Summary
6.046 Summary (contd.)
6.047 Conclusion

Lesson 07 – Spark GraphX Programming

7.001 Introduction
7.002 Objectives
7.003 Introduction to Graph-Parallel System
7.004 Limitations of Graph-Parallel System
7.005 Introduction to GraphX
7.006 Introduction to GraphX (contd.)
7.007 Importing GraphX
7.008 The Property Graph
7.009 The Property Graph (contd.)
7.010 Features of the Property Graph
7.011 Creating a Graph
7.012 Demo-Create a Graph Using GraphX
7.013 Create a Graph Using GraphX
7.014 Triplet View
7.015 Graph Operators
7.016 List of Operators
7.017 List of Operators (contd.)
7.018 Property Operators
7.019 Structural Operators
7.020 Subgraphs
7.021 Join Operators
7.022 Demo-Perform Graph Operations Using GraphX
7.023 Perform Graph Operations Using GraphX
7.024 Demo-Perform Subgraph Operations
7.025 Perform Subgraph Operations
7.026 Neighborhood Aggregation
7.027 mapReduceTriplets
7.028 Demo-Perform MapReduce Operations
7.029 Perform MapReduce Operations
7.030 Counting Degree of Vertex
7.031 Collecting Neighbors
7.032 Caching and Uncaching
7.033 Graph Builders
7.034 Vertex and Edge RDDs
7.035 Graph System Optimizations
7.036 Built-in Algorithms
7.037 Quiz
7.038 Summary
7.039 Summary (contd.)
7.040 Conclusion


  • Vast selection of courses and labs Access
  • Unlimited access from all devices
  • Learn from industry expert instructors
  • Assessment quizzes and monitor progress
  • Vast selection of courses and labs Access
  • Blended Learning with Virtual Classes
  • Access to new courses every quarter
  • 100% satisfaction guarantee

You Will Get Certification After Completetion This Course.

Instructor Led Lectures
All IT Tutor Pro Formerly It Nuggets Courses replicate a live class experience with an instructor on screen delivering the course’s theories and concepts.These lectures are pre-recorded and available to the user 24/7. They can be repeated, rewound, fast forwarded.
Visual Demonstrations, Educational Games & Flashcards
IT Tutor Pro Formerly It Nuggets recognizes that all students do not learn alike and different delivery mediums are needed in order to achieve success for a large student base. With that in mind, we delivery our content in a variety of different ways to ensure that students stay engaged and productive throughout their courses.
Mobile Optimization & Progress Tracking
Our courses are optimized for all mobile devices allowing students to learn on the go whenever they have free time. Students can access their courses from anywhere and their progress is completely tracked and recorded.
Practice Quizzes And Exams
IT Tutor Pro Formerly It Nuggets Online’s custom practice exams prepare you for your exams differently and more effectively than the traditional exam preps on the market. Students will have practice quizzes after each module to ensure you are confident on the topic you are learning.
World Class Learning Management System
IT Tutor Pro Formerly It Nuggets provides the next generation learning management system (LMS). An experience that combines the feature set of traditional Learning Management Systems with advanced functionality designed to make learning management easy and online learning engaging from the user’s perspective.

Frequently Asked Questions

How does online education work on a day-to-day basis?
Instructional methods, course requirements, and learning technologies can vary significantly from one online program to the next, but the vast bulk of them use a learning management system (LMS) to deliver lectures and materials, monitor student progress, assess comprehension, and accept student work. LMS providers design these platforms to accommodate a multitude of instructor needs and preferences.
Is online education as effective as face-to-face instruction?
Online education may seem relatively new, but years of research suggests it can be just as effective as traditional coursework, and often more so. According to a U.S. Department of Education analysis of more than 1,000 learning studies, online students tend to outperform classroom-based students across most disciplines and demographics. Another major review published the same year found that online students had the advantage 70 percent of the time, a gap authors projected would only widen as programs and technologies evolve.
Do employers accept online degrees?
All new learning innovations are met with some degree of scrutiny, but skepticism subsides as methods become more mainstream. Such is the case for online learning. Studies indicate employers who are familiar with online degrees tend to view them more favorably, and more employers are acquainted with them than ever before. The majority of colleges now offer online degrees, including most public, not-for-profit, and Ivy League universities. Online learning is also increasingly prevalent in the workplace as more companies invest in web-based employee training and development programs.
Is online education more conducive to cheating?
The concern that online students cheat more than traditional students is perhaps misplaced. When researchers at Marshall University conducted a study to measure the prevalence of cheating in online and classroom-based courses, they concluded, “Somewhat surprisingly, the results showed higher rates of academic dishonesty in live courses.” The authors suggest the social familiarity of students in a classroom setting may lessen their sense of moral obligation.
How do I know if online education is right for me?
Choosing the right course takes time and careful research no matter how one intends to study. Learning styles, goals, and programs always vary, but students considering online courses must consider technical skills, ability to self-motivate, and other factors specific to the medium. Online course demos and trials can also be helpful.
What technical skills do online students need?
Our platform typically designed to be as user-friendly as possible: intuitive controls, clear instructions, and tutorials guide students through new tasks. However, students still need basic computer skills to access and navigate these programs. These skills include: using a keyboard and a mouse; running computer programs; using the Internet; sending and receiving email; using word processing programs; and using forums and other collaborative tools. Most online programs publish such requirements on their websites. If not, an admissions adviser can help.