New members: get your first 7 days of ITTutorPro Premium for free! Join for free

Big Data Engineer

Course Description

This Big Data Engineer Master’s Certification program in collaboration with IBM provides online training on the best big data courses to impart skills required for a successful career in data engineering. Master the Big Data & Hadoop frameworks, leverage the functionality of AWS services, and use the database management tool MongoDB to store data.

About the Program

About the Big Data Engineer certification program developed in collaboration with IBM

IBM is the second-largest predictive analytics and Machine Learning solutions provider globally (The Forrester Wave report, September 2018). A joint partnership with Simplilearn and IBM introduces students to integrated blended learning, making them experts in Big Data and Data Engineering. The Big Data Engineer certification training developed in collaboration with IBM will make students industry ready to start their careers in Big Data and Data Engineer job roles.

IBM is a leading cognitive solution and cloud platform company, headquartered in Armonk, New York, offering a plethora of technology and consulting services. Each year, IBM invests $6 billion in research and development and has achieved five Nobel prizes, nine US National Medals of Technology, five US National Medals of Science, six Turing Awards, and 10 Inductions in US Inventors Hall of Fame.

What can I expect from this Simplilearn’s Big Data Engineer Master’s program developed in collaboration with IBM?

Upon completion of this Big Data Engineer Master’s Program, you will receive the certificates from IBM and Simplilearn in the Big Data courses in the learning path*. These certificates will testify to your skills as an expert in Data Engineering. You will also receive the following:

  • USD 1200 worth of IBM cloud credits that you can leverage for hands-on exposure
  • Access to IBM cloud platforms featuring IBM Watson and other software for 24/7 practice
  • Industry-recognized Big Data Engineer Master’s Certificate from Simplilearn

What are the learning objectives?

Big Data has a major impact on businesses worldwide, with applications in a wide range of industries such as healthcare, insurance, transport, logistics, and customer service. A role in this Data Engineering places you on the path to an exciting, evolving career that is predicted to grow sharply into 2025 and beyond.

This co-developed Simplilearn and IBM Big Data Engineering certification training is designed to give you in-depth knowledge of the flexible and versatile frameworks on the Hadoop ecosystem and data engineering tools like Data Model Creation, Database Interfaces, Advanced Architecture, Spark, Scala, RDD, SparkSQL, Spark Streaming, Spark ML, GraphX, Sqoop, Flume, Pig, Hive, Impala, and Kafka Architecture. This integrated program will also teach you to model data, perform ingestion, replicate data, and shard data using a NoSQL database management system MongoDB.

The Big Data Engineer course curriculum will give you hands-on experience connecting Kafka to Spark and working with Kafka Connect.

Why become a Big Data Engineer?

Big Data Engineers create and maintain analytics infrastructure and are responsible for the development, deployment, maintenance, and monitoring of architecture components, such as databases and large-scale processing systems. The global Big Data and data engineering services market is expected to grow at a CAGR of 31.3 percent by 2025, so this is the perfect time to pursue a career in this field.

The valuable skills you’ll acquire with this online training in Big Data courses will help you secure employment with companies as diverse as IBM, Coca-Cola, Ford Motors, Amazon, HCL, and Uber. Big Data Engineers are employable across a variety of industries such as transportation, healthcare, telecommunications, finance, manufacturing, and many more. According to Glassdoor, the average annual salary for a data engineer is $137,776, with more than 130K jobs in this field worldwide.

What skills will you learn in this Big Data Engineer training?

The Big Data Engineer learning path ensures that you master the various components of the Hadoop ecosystem, such as MapReduce, Pig, Hive, Impala, HBase, and Sqoop, and learn real-time processing in Spark and Spark SQL. By the end of this Big Data Engineer certification training, you will:

  • Gain insights on how to improve business productivity by processing Big Data on platforms that can handle its volume, velocity, variety, and veracity
  • Master the various components of the Hadoop ecosystem, such as Hadoop, Yarn, MapReduce, Pig, Hive, Impala, HBase, ZooKeeper, Oozie, Sqoop, and Flume
  • Become an expert in MongoDB by gaining an in-depth knowledge of NoSQL and mastering the skills of data modeling, ingestion, query, sharding, and data replication
  • Learn how Kafka is used in the real world, including its architecture and components, get hands-on experience connecting Kafka to Spark, and work with Kafka Connect
  • Get a solid understanding of the fundamentals of the Scala language, it’s tooling and the development process
  • Identify AWS concepts, terminologies, benefits, and deployment options to meet the business requirements
  • Understand how to use Amazon EMR for processing the data using Hadoop ecosystem tools
  • Understand how to use Amazon Kinesis for big data processing in real-time
  • Analyze and transform big data using Kinesis Streams
  • Visualize data and perform queries using Amazon QuickSight

What projects are included in this Big Data Engineer certification training?

This Big Data Engineer certification training includes more than 12 real-life, industry-based projects on different domains to help you master concepts of Data Engineering, such as Clusters, Scalability, and Configuration. A few of the projects that you will be working on are mentioned below:

Project 1: See how large MNCs like Microsoft, Nestle, and PepsiCo set up their Big data clusters by gaining hands-on experience.
Project Title: Scalability-Deploying Multiple Clusters
Description: Your company wants to set up a new cluster and has procured new machines. However, setting up clusters on new machines will take time. Meanwhile, your company wants you to set up a new cluster on the same set of machines and start testing the new cluster’s working and applications.

Project 2: Understand how companies like Facebook, Amazon, and Flipkart leverage Big Data Clusters.
Project Title: Working with Clusters
Description: Demonstrate your understanding of the following tasks:

  • Enabling and disabling HA for name node and resource manager in CDH
  • Removing Hue service from your cluster, which has other services such as Hive, HBase, HDFS, and YARN setup
  • Adding a user and granting read access to your Cloudera cluster
  • Changing replication and block size of your cluster
  • Adding Hue as a service, logging in as user HUE, and downloading examples for Hive, Pig, job designer, and others

Project 3: See how banks like Citigroup, Bank of America, ICICI, and HDFC make use of Big Data to stay ahead of the competition.
Domain: Banking
Description: A Portuguese banking institution ran a marketing campaign to convince potential customers to invest in a bank term deposit. Their marketing campaigns were conducted through phone calls, and sometimes the same customer was contacted more than once. Your job is to analyze the data collected from the marketing campaign.

Project 4: Learn how Telecom giants like AT&T, Vodafone, and Airtel make use of Big Data by working on a real-life project based on telecommunication.
Domain: Telecommunication
Description: A mobile phone service provider has launched a new Open Network campaign. The company has invited users to raise complaints about the towers in their locality if they face issues with their mobile network. The company has collected the dataset of users who raised a complaint. The fourth and the fifth field of the dataset have a latitude and longitude of users, which is important information for the company. You must find this latitude and longitude information on the basis of the available dataset and create three clusters of users with a k-means algorithm.

Project 5: Understand how entertainment companies like Netflix, Amazon Prime leverage Big Data.
Domain: Movie Industry
Description: US-based university has collected datasets that represent reviews of movies from multiple reviewers as a part of the Research Project. To gain in-depth insights from research data collected you have to perform a series of tasks in Spark on the dataset provided.

Project 6: Learn how E-Learning companies like Simplilearn, Lynda, and Pluralsight make use of NoSQL and Big Data technology.
Domain: E-Learning Industry
Description: Design a web application for a leading E-learning organization using MongoDB to support read and write scalability. You can use web technologies such as HTML, JavaScript (JSP), Servlet, and Java. Using this web application, a user should able to add, retrieve, edit, and delete the course information using MongoDB as the backend database.

What are the prerequisites for this Big Data Engineer training?

The Big Data Engineer certification training is ideal for anyone who wishes to pursue a career in data engineering. There are no prerequisites to take this course, but prior knowledge of the listed skills and technologies are beneficial, including:

  • Algorithms and data structures
  • SQL
  • Programming knowledge of Python and Java
  • Cloud platforms and distributed systems
  • Data pipelines

Share on:

Course Syllabus

Course 1

Big Data for Data Engineering
This introductory course from IBM will teach you the basic concepts and terminologies of Big Data and its real-life applications across industries. You will gain insights on how to improve business productivity by processing large volumes of data and extract valuable information from them.

Big Data for Data Engineering

Lesson 1 Welcome03:49
1.3 Learning Objectives
1.1 Introduction to Big Data02:40
1.2 Welcome01:09
Lesson 2 What_is_Big_data09:56
Learning Objectives
What is Big data05:49
Big data in Business04:07
Big Data and Business Analytics comes of age_Oct 2011
Lesson 3 Beyond_the_Hype05:57
Learning Objectives
Beyond the Hype05:57
Facebook joins Google in HPC Computing Architectures for Big Data_Apr 2011
Lesson 4 Big_data_and_data_Science05:57
Learning Objectives
Big data and data science05:57
Climate Change and Big Data_Dec 2012
Lesson 5 Big_Data_use_Cases05:36
Learning Objectives
Big data use cases05:36
Big Data and Sensors_Jan 2013
Lesson 6 Processing_Big_Data05:55
Learning Objectives
Processing big data05:55
Hadoop and Lustre – Some Thoughts_Jun 2011
Lesson 7 Course_Summary02:53
Big Data 101 Conclusion02:53
Unlocking IBM Certificate

Course 2 Online Classroom Flexi Pass

Big Data Hadoop and Spark Developer
Our Big Data Hadoop certification training course lets you master the concepts of the Hadoop framework, Big Data tools, and methodologies to prepare you for success in your role as a Big Data Developer. Learn how various components of the Hadoop ecosystem fit into the Big Data processing lifecycle.

Big Data Hadoop and Spark Developer
Lesson 1 Course Introduction08:51
1.1 Course Introduction05:52
1.2 Accessing Practice Lab02:59
Lesson 2 Introduction to Big Data and Hadoop43:59
1.1 Introduction to Big Data and Hadoop00:31
1.2 Introduction to Big Data01:02
1.3 Big Data Analytics04:24
1.4 What is Big Data02:54
1.5 Four Vs Of Big Data02:13
1.6 Case Study Royal Bank of Scotland01:31
1.7 Challenges of Traditional System03:38
1.8 Distributed Systems01:55
1.9 Introduction to Hadoop05:28
1.10 Components of Hadoop Ecosystem Part One02:17
1.11 Components of Hadoop Ecosystem Part Two02:53
1.12 Components of Hadoop Ecosystem Part Three03:48
1.13 Commercial Hadoop Distributions04:19
1.14 Demo: Walkthrough of Simplilearn Cloudlab06:51
1.15 Key Takeaways00:15
Knowledge Check
Lesson 3 Hadoop Architecture,Distributed Storage (HDFS) and YARN57:50
2.1 Hadoop Architecture Distributed Storage (HDFS) and YARN00:50
2.2 What Is HDFS00:54
2.3 Need for HDFS01:52
2.4 Regular File System vs HDFS01:27
2.5 Characteristics of HDFS03:24
2.6 HDFS Architecture and Components02:30
2.7 High Availability Cluster Implementations04:47
2.8 HDFS Component File System Namespace02:40
2.9 Data Block Split02:32
2.10 Data Replication Topology01:16
2.11 HDFS Command Line02:14
2.12 Demo: Common HDFS Commands04:39
HDFS Command Line
2.13 YARN Introduction01:32
2.14 YARN Use Case02:21
2.15 YARN and Its Architecture02:09
2.16 Resource Manager02:14
2.17 How Resource Manager Operates02:28
2.18 Application Master03:29
2.19 How YARN Runs an Application04:39
2.20 Tools for YARN Developers01:38
2.21 Demo: Walkthrough of Cluster Part One03:06
2.22 Demo: Walkthrough of Cluster Part Two04:35
2.23 Key Takeaways00:34
Knowledge Check
Hadoop Architecture,Distributed Storage (HDFS) and YARN
Lesson 4 Data Ingestion into Big Data Systems and ETL01:05:21
3.1 Data Ingestion into Big Data Systems and ETL00:42
3.2 Data Ingestion Overview Part One01:51
3.3 Data Ingestion Overview Part Two01:41
3.4 Apache Sqoop02:04
3.5 Sqoop and Its Uses03:02
3.6 Sqoop Processing02:11
3.7 Sqoop Import Process02:24
3.8 Sqoop Connectors04:22
3.9 Demo: Importing and Exporting Data from MySQL to HDFS05:07
Apache Sqoop
3.9 Apache Flume02:42
3.10 Flume Model01:56
3.11 Scalability in Flume01:33
3.12 Components in Flume’s Architecture02:40
3.13 Configuring Flume Components01:58
3.15 Demo: Ingest Twitter Data04:43
3.14 Apache Kafka01:54
3.15 Aggregating User Activity Using Kafka01:34
3.16 Kafka Data Model02:56
3.17 Partitions02:04
3.18 Apache Kafka Architecture03:02
3.21 Demo: Setup Kafka Cluster03:52
3.19 Producer Side API Example02:30
3.20 Consumer Side API00:43
3.21 Consumer Side API Example02:36
3.22 Kafka Connect01:14
3.26 Demo: Creating Sample Kafka Data Pipeline using Producer and Consumer03:35
3.23 Key Takeaways00:25
Knowledge Check
Data Ingestion into Big Data Systems and ETL
Lesson 5 Distributed Processing – MapReduce Framework and Pig01:01:09
4.1 Distributed Processing MapReduce Framework and Pig00:44
4.2 Distributed Processing in MapReduce03:01
4.3 Word Count Example02:09
4.4 Map Execution Phases01:48
4.5 Map Execution Distributed Two Node Environment02:10
4.6 MapReduce Jobs01:55
4.7 Hadoop MapReduce Job Work Interaction02:24
4.8 Setting Up the Environment for MapReduce Development02:57
4.9 Set of Classes02:09
4.10 Creating a New Project02:25
4.11 Advanced MapReduce01:30
4.12 Data Types in Hadoop02:22
4.13 OutputFormats in MapReduce02:25
4.14 Using Distributed Cache01:51
4.15 Joins in MapReduce03:07
4.16 Replicated Join02:37
4.17 Introduction to Pig02:03
4.18 Components of Pig02:08
4.19 Pig Data Model02:23
4.20 Pig Interactive Modes03:18
4.21 Pig Operations01:19
4.22 Various Relations Performed by Developers03:06
4.23 Demo: Analyzing Web Log Data Using MapReduce05:43
4.24 Demo: Analyzing Sales Data and Solving KPIs using PIG02:46
Apache Pig
4.25 Demo: Wordcount02:21
4.23 Key takeaways00:28
Knowledge Check
Distributed Processing – MapReduce Framework and Pig
Lesson 6 Apache Hive59:47
5.1 Apache Hive00:37
5.2 Hive SQL over Hadoop MapReduce01:38
5.3 Hive Architecture02:41
5.4 Interfaces to Run Hive Queries01:47
5.5 Running Beeline from Command Line01:51
5.6 Hive Metastore02:58
5.7 Hive DDL and DML02:00
5.8 Creating New Table03:15
5.9 Data Types01:37
5.10 Validation of Data02:41
5.11 File Format Types02:40
5.12 Data Serialization02:35
5.13 Hive Table and Avro Schema02:38
5.14 Hive Optimization Partitioning Bucketing and Sampling01:28
5.15 Non Partitioned Table01:58
5.16 Data Insertion02:22
5.17 Dynamic Partitioning in Hive02:43
5.18 Bucketing01:44
5.19 What Do Buckets Do02:04
5.20 Hive Analytics UDF and UDAF03:11
5.21 Other Functions of Hive03:17
5.22 Demo: Real-Time Analysis and Data Filteration03:18
5.23 Demo: Real-World Problem04:30
5.24 Demo: Data Representation and Import using Hive03:52
5.25 Key Takeaways00:22
Knowledge Check
Apache Hive
Lesson 7 NoSQL Databases – HBase21:41
6.1 NoSQL Databases HBase00:33
6.2 NoSQL Introduction04:42
Demo: Yarn Tuning03:28
6.3 HBase Overview02:53
6.4 HBase Architecture04:43
6.5 Data Model03:11
6.6 Connecting to HBase01:56
HBase Shell
6.7 Key Takeaways00:15
Knowledge Check
NoSQL Databases – HBase
Lesson 8 Basics of Functional Programming and Scala48:00
7.1 Basics of Functional Programming and Scala00:39
7.2 Introduction to Scala02:59
7.3 Demo: Scala Installation02:54
7.3 Functional Programming03:08
7.4 Programming with Scala04:01
Demo: Basic Literals and Arithmetic Operators02:57
Demo: Logical Operators01:21
7.5 Type Inference Classes Objects and Functions in Scala04:45
Demo: Type Inference Functions Anonymous Function and Class05:04
7.6 Collections01:33
7.7 Types of Collections05:37
Demo: Five Types of Collections03:42
Demo: Operations on List03:16
7.8 Scala REPL02:27
Demo: Features of Scala REPL03:17
7.9 Key Takeaways00:20
Knowledge Check
Basics of Functional Programming and Scala
Lesson 9 Apache Spark Next Generation Big Data Framework36:54
8.1 Apache Spark Next Generation Big Data Framework00:43
8.2 History of Spark01:58
8.3 Limitations of MapReduce in Hadoop02:48
8.4 Introduction to Apache Spark01:11
8.5 Components of Spark03:10
8.6 Application of In-Memory Processing02:54
8.7 Hadoop Ecosystem vs Spark01:30
8.8 Advantages of Spark03:22
8.9 Spark Architecture03:42
8.10 Spark Cluster in Real World02:52
8.11 Demo: Running a Scala Programs in Spark Shell03:45
8.12 Demo: Setting Up Execution Environment in IDE04:18
8.13 Demo: Spark Web UI04:14
8.11 Key Takeaways00:27
Knowledge Check
Apache Spark Next Generation Big Data Framework
Lesson 10 Spark Core Processing RDD01:16:31
9.1 Processing RDD00:37
9.1 Introduction to Spark RDD02:35
9.2 RDD in Spark02:18
9.3 Creating Spark RDD05:48
9.4 Pair RDD01:53
9.5 RDD Operations03:20
9.6 Demo: Spark Transformation Detailed Exploration Using Scala Examples03:13
9.7 Demo: Spark Action Detailed Exploration Using Scala03:32
9.8 Caching and Persistence02:41
9.9 Storage Levels03:31
9.10 Lineage and DAG02:11
9.11 Need for DAG02:51
9.12 Debugging in Spark01:11
9.13 Partitioning in Spark04:05
9.14 Scheduling in Spark03:28
9.15 Shuffling in Spark02:41
9.16 Sort Shuffle03:18
9.17 Aggregating Data with Pair RDD01:33
9.18 Demo: Spark Application with Data Written Back to HDFS and Spark UI09:08
9.19 Demo: Changing Spark Application Parameters06:27
9.20 Demo: Handling Different File Formats02:51
9.21 Demo: Spark RDD with Real-World Application04:03
9.22 Demo: Optimizing Spark Jobs02:56
9.23 Key Takeaways00:20
Knowledge Check
Spark Core Processing RDD
Lesson 11 Spark SQL – Processing DataFrames29:08
10.1 Spark SQL Processing DataFrames00:32
10.2 Spark SQL Introduction02:13
10.3 Spark SQL Architecture01:25
10.4 DataFrames05:21
10.5 Demo: Handling Various Data Formats03:21
10.6 Demo: Implement Various DataFrame Operations03:20
10.7 Demo: UDF and UDAF02:50
10.8 Interoperating with RDDs04:45
10.9 Demo: Process DataFrame Using SQL Query02:30
10.10 RDD vs DataFrame vs Dataset02:34
Processing DataFrames
10.11 Key Takeaways00:17
Knowledge Check
Spark SQL – Processing DataFrames
Lesson 12 Spark MLLib – Modelling BigData with Spark34:04
11.1 Spark MLlib Modeling Big Data with Spark00:38
11.2 Role of Data Scientist and Data Analyst in Big Data02:12
11.3 Analytics in Spark03:37
11.4 Machine Learning03:27
11.5 Supervised Learning02:19
11.6 Demo: Classification of Linear SVM03:47
11.7 Demo: Linear Regression with Real World Case Studies03:41
11.8 Unsupervised Learning01:16
11.9 Demo: Unsupervised Clustering K-Means02:45
11.10 Reinforcement Learning02:02
11.11 Semi-Supervised Learning01:17
11.12 Overview of MLlib02:59
11.13 MLlib Pipelines03:42
11.14 Key Takeaways00:22
Knowledge Check
Spark MLLib – Modeling BigData with Spark
Lesson 13 Stream Processing Frameworks and Spark Streaming01:13:16
12.1 Stream Processing Frameworks and Spark Streaming00:34
12.1 Streaming Overview01:41
12.2 Real-Time Processing of Big Data02:45
12.3 Data Processing Architectures04:12
12.4 Demo: Real-Time Data Processing02:28
12.5 Spark Streaming04:21
12.6 Demo: Writing Spark Streaming Application03:15
12.7 Introduction to DStreams01:52
12.8 Transformations on DStreams03:44
12.9 Design Patterns for Using ForeachRDD03:25
12.10 State Operations00:46
12.11 Windowing Operations03:16
12.12 Join Operations stream-dataset Join02:13
12.13 Demo: Windowing of Real-Time Data Processing02:32
12.14 Streaming Sources01:56
12.15 Demo: Processing Twitter Streaming Data03:56
12.16 Structured Spark Streaming03:54
12.17 Use Case Banking Transactions02:29
12.18 Structured Streaming Architecture Model and Its Components04:01
12.19 Output Sinks00:49
12.20 Structured Streaming APIs03:36
12.21 Constructing Columns in Structured Streaming03:07
12.22 Windowed Operations on Event-Time03:36
12.23 Use Cases01:24
12.24 Demo: Streaming Pipeline07:07
Spark Streaming
12.25 Key Takeaways00:17
Knowledge Check
Stream Processing Frameworks and Spark Streaming
Lesson 14 Spark GraphX28:43
13.1 Spark GraphX00:35
13.2 Introduction to Graph02:38
13.3 Graphx in Spark02:41
13.4 Graph Operators03:29
13.5 Join Operators03:18
13.6 Graph Parallel System01:33
13.7 Algorithms in Spark03:26
13.8 Pregel API02:31
13.9 Use Case of GraphX01:02
13.10 Demo: GraphX Vertex Predicate02:23
13.11 Demo: Page Rank Algorithm02:33
13.12 Key Takeaways00:17
Knowledge Check
Spark GraphX
13.14 Project Assistance02:17
Practice Projects
Car Insurance Analysis
Transactional Data Analysis
K-Means clustering for telecommunication domain

Course 3

PySpark Training Course
Get ready to add some Spark to your Python code with this PySpark certification training. This course gives you an overview of the Spark stack and lets you know how to leverage the functionality of Python as you deploy it in the Spark ecosystem. It helps you gain the skills required to become a PySpark developer.

PySpark Training

Lesson 1 A Brief Primer on PySpark14:52
1.1 A Brief Primer on PySpark05:52
1.2 Brief Introduction to Spark02:04
1.3 Apache Spark Stack01:38
1.4 Spark Execution Process01:26
1.05 Newest Capabilities of PySpark01:56
1.6 Cloning GitHub Repository01:56
Lesson 2 Resilient Distributed Datasets38:44
2.1 Resilient Distributed Datasets01:49
2.2 Creating RDDs04:38
2.3 Schema of an RDD02:17
2.4 Understanding Lazy Execution02:11
2.5 Introducing Transformations – .map(…)03:57
2.6 Introducing Transformations – .filter(…)02:23
2.7 Introducing Transformations – .flatMap(…)06:14
2.8 Introducing Transformations – .distinct(…)03:27
2.9 Introducing Transformations – .sample(…)03:15
2.10 Introducing Transformations – .join(…)04:17
2.11 Introducing Transformations – .repartition(…)04:16
Lesson 3 Resilient Distributed Datasets and Actions35:27
3.1 Resilient Distributed Datasets and Actions05:43
3.2 Introducing Actions – .collect(…)02:15
3.3 Introducing Actions – .reduce(…) and .reduceByKey(…)02:59
3.4 Introducing Actions – .count()02:36
3.5 Introducing Actions – .foreach(…)01:51
3.6 Introducing Actions – .aggregate(…) and .aggregateByKey(…)04:55
3.7 Introducing Actions – .coalesce(…)02:05
3.8 Introducing Actions – .combineByKey(…)03:11
3.9 Introducing Actions – .histogram(…)01:50
3.10 Introducing Actions – .sortBy(…)02:38
3.11 Introducing Actions – Saving Data03:10
3.12 Introducing Actions – Descriptive Statistics02:14
Lesson 4 DataFrames and Transformations32:33
4.1 DataFrames and Transformations01:35
4.2 Creating DataFrames04:16
4.3 Specifying Schema of a DataFrame06:00
4.4 Interacting with DataFrames01:36
4.5 The .agg(…) Transformation03:19
4.6 The .sql(…) Transformation03:57
4.7 Creating Temporary Tables02:31
4.8 Joining Two DataFrames03:54
4.9 Performing Statistical Transformations03:55
4.10 The .distinct(…) Transformation01:30
Lesson 5 Data Processing with Spark DataFrames27:16
5.1 Data Processing with Spark DataFrames06:29
5.2 Filtering Data01:31
5.3 Aggregating Data02:34
5.4 Selecting Data02:24
5.5 Transforming Data01:40
5.6 Presenting Data01:34
5.7 Sorting DataFrames01:00
5.8 Saving DataFrames04:28
5.9 Pitfalls of UDFs03:38
5.10 Repartitioning Data01:58

Course 4

Apache Kafka
Learn to process huge amounts of data using different tools and empower your organization to better leverage Big Data analytics with the Apache Kafka certification course.

Section 01 – Introduction to Apache Kafka
Lesson 01 – Course Introduction07:16
Course Introduction07:16
Lesson 02 – Big Data Overview03:07
Big Data Overview03:07
Lesson 03 – Big Data Analytics02:55
Big Data Analytics02:55
Lesson 04 – Messaging System05:48
Messaging System05:48
Lesson 05 – Kafka Overview08:33
Introduction and Features of Kafka04:26
Kafka Usercases04:07
Lesson 06 – Kafka Components and Architecture09:16
Kafka Terminologies01:13
Kafka Components07:10
Kafka Architecture00:53
Lesson 07 – Kafka Clusters01:27
Kafka Clusters01:27
Lesson 08 – Kafka Industry Usecases02:27
Kafka Industry Usecases02:27
Lesson 09 – Demo: Install Kafka and Zookeeper04:58
DEMO: Install zookeeper03:17
Demo: Install Kafka01:41
Lesson 10 – Demo: Single Node Single-Multi Broker Cluster05:38
DEMO: Setup A Single Node Single Broker Cluster02:39
DEMO: Setup A multi Node Single Broker Cluster02:59
Lesson 11 – Key Takeaways00:15
Key Takeaways00:15

Section 02 – Kafka Producer
Lesson 01 – Overview of Producer and Its Architecture04:51
Learning Objectives04:51
Lesson 02 – Kafka Producer Configuration14:33
Kafka Producer Configuration02:27
Kafka Producer Optional Configuration02:49
Kafka Producer Configuration Objects06:59
Demo: Create a Kafka Producer02:18
Lesson 03 – Send Messages04:50
Sending Messages04:50
Lesson 04 – Serializers13:51
Serializers and Custom Serializers02:57
Demo: Creating a Custom Serializer03:59
Serializers Challenges and Serializing using Apache Avro04:22
Demo: Serializing Using Apache Avro02:33
Lesson 05 – Partitions08:50
Demo: Setup Custom Partition04:12
Lesson 06 – Key Takeaways00:17
Key Takeaways00:17

Section 03 – Kafka Consumer
Lesson 01 – Kafka Consumer – Overview, Consumer Groups and Partitioners12:27
Overview of Kafka Consumers02:58
Consumer Groups05:05
Partition Rebalance and Creating a Consumer04:24
Lesson 02 – Poll Loop02:42
Poll loop and it’s functioning02:42
Lesson 03 – Configuring Consumer12:26
1.Kafka Consumer Configuration- Part One04:35
Kafka Consumer Configuration- Part Two05:09
Demo: Create Kafka Consumer02:42
Lesson 04 – Commit and Offset13:59
Commit and Offset04:23
Ways of Committing Offset – Automatic Offset02:16
Ways of Commiting Offset – Commit Current Offset01:34
Ways of Commiting Offset – Asynchronous Commit02:15
Ways of Commiting Offset – Combining Synchronous and Asynchronous Commits02:18
Ways of Commiting Offset – Commit Specified Offset01:13
Lesson 05 – Rebalance Listeners01:45
Rebalance Listeners01:45
Lesson 06 – Consuming Records with Specific Offset04:13
Consuming Records with Specific Offset04:13
Lesson 07 – Deserializers05:32
Demo: Create and Use Custom Deserializer02:14
Lesson 08 – Key Takeaways00:30
Key Takeaways00:30
Section 04 – Kafka Operations and Performance Tuning
Lesson 01 – Learning Objectives04:46
Learning Objectives04:46
Lesson 02 – Replications14:53
Replication and Replica Types04:19
Preffered Ladder, Request and Request Processing04:24
Types of Requests06:10
Lesson 04 – Storage09:59
Partition Allocation, File Management and Segments05:21
File Format, Index and Compaction04:38
Lesson 05 – Configuration in Reliable System18:18
Kafka Reliability and Reliablity Methods01:34
Broker Configuration for Replication One04:41
Producer in Reliable System04:37
Consumer in Reliable System07:26
Lesson 05 – Key Takeaways00:28
Key Takeaways00:28

Section 05 – Kafka Cluster Architecture and Administering Kafka
Lesson 01 – Learning Objectives05:22
Learning Objectives05:22
Lesson 02 – Multi Cluster Architecture08:45
Hub and Spokes Architecture and Active Active Architecture05:37
Active-Standby Architecture and Stretch Clusters03:08
Lesson 03 – MirrorMaker17:41
MirrorMaker Configuration04:27
MirrorMaker Deployment and Tuning06:33
Demo: Setting up MirrorMaker06:41
Lesson 04 – Administering Kafka09:50
Administering Kafka – Topic Operations05:50
Administering Kafka – Consumer Group Operations04:00
Lesson 05 – Dynamic Configuration Changes09:20
Dynamic Configuration Changes06:52
Partition Management02:28
Lesson 06 – Console Producer Tool01:27
Console Producer Tool01:27
Lesson 07 – Console Consumer Tool02:36
Console Consumer Tool02:36
Lesson 08 – Key Takeaways00:25
Key Takeaways00:25

Section 06 – Kafka Monitoring and Schema Registry
Lesson 01 – Monitoring47:23
Learning Objectives02:17
Server or Infrastructure Monitoring and Application Monitoring02:15
Kafka Monitoring03:16
Kafka Broker Metrics – Under Replicated Partitions06:25
Kafka Broker Metrics – Others06:42
Topic and Partition Specific Metrics01:56
Logging and Client Monitoring02:48
Producer and Consumer Metrics06:11
Quotas and Lag Monitoring05:53
Monitoring Dashboard03:33
Demo: Setting up Open Source Health Monitor06:07
Lesson 02 – Kafka Schema Registry and Avro06:27
Kafka Schema Registry06:27
Lesson 03 – Kafka Schema Registry Components08:14
Kafka Component and Architecture04:01
Kafka Schema Registry – Internal working and Use-cases04:13
Lesson 04 – Kafka Schema Registry Working08:25
Kafka Schema Registry Working04:37
Demo: Using Kafka Schema Registry With Kafka03:48
Lesson 05 – Key Takeaways00:17
Key Takeaways00:17

Section 07 – Kafka Streams and Kafka Connectors
Lesson 01 – Kafka Stream Overview09:49
Learning Objectives04:13
Kafka Stream05:36
Lesson 02 – Kafka Stream Architecture, Working and Components50:42
Kafka Stream Architecture and Working02:52
Kafka Stream Components04:28
Kafka Stream Architecture Tasks, Threading Model and Local State Store04:30
Kafka Stream Architecture – Record Buffer02:17
Memory Management and Streaming Data Pipeline06:15
Kafka Stream DSL06:41
KStream Operations09:54
KTable Operations04:00
Aggregation and Windowing05:41
Lesson 03 – Stream Concepts and Working15:30
Processor Topology and Stream Processor04:19
Stream and Processor APIs one07:09
Processor APIs – Create Topology04:02
Lesson 04 – Kafka Connectors06:08
Kafka Connectors06:08
Lesson 05 – Kafka Connector Configuration25:08
Standalone and Sink Connector Configuration04:07
Running Kafka Connect04:11
Kafka Connector Distributed Mode03:56
HTTP Rest Interface04:42
Demo: Kafka Connector02:41
Demo: Create an Application using Kafka Streams05:31
Lesson 06 – Key Takeaways00:20
Key Takeaways00:20

Section 08 – Integration of Kafka with Storm
Lesson 01 – Apache Storm09:10
Learning Objectives02:58
Real-time Analytics06:12
Lesson 02 – Apache Storm Architecture and Components08:34
Apache Storm Architecture04:05
Apache Storm Components04:29
Lesson 03 – Apache Storm Topology10:44
Apache Storm Topology05:05
Apache Storm Topology – Execution Plan05:39
Lesson 04 – Kafka Spout03:54
Kafka Spout03:54
Lesson 05 – Integration of Apache Storm and Kafka10:19
Integration of Apache Storm and Kafka06:52
Demo: Simple Standalone Application using Kafka and Storm03:27
Lesson 06 – Key Takeaways00:20
Key Takeaways00:20

Section 09 – Kafka Integration with Spark and Flume
Lesson 01 – Introduction to Spark and It_s Components10:59
Learning Objectives03:44
Spark Components07:15
Lesson 02 – Basics of Spark – RDD, Data Sets, and Transformation and Actions24:46
RDD Operations – Transformation – Map, FlatMap and Filter05:12
RDD Operations – Transformation – Join, Distinct, First and Take05:28
RDD Operations – Actions02:29
Data Sets and Spark Session02:38
Data Sets and Spark Session Operations05:13
Lesson 03 – Spark Stream03:09
Spark Stream03:09
Lesson 04 – Spark Integration with Kafka06:26
Spark Integration with Kafka02:36
Demo: Running Small Standalone Application in Spark with Kafka03:50
Lesson 05 – Flume08:03
Flume Connectors02:38
Lesson 06 -Flume Kafka to HDFS Configuration13:28
Flume Kafka to HDFS Configuration05:57
Demo: Creating Flume agent Sending data from Kafka to HDFS07:31
Lesson 07 – Key Takeaways00:29
Key Takeaways00:29

Section 10 – Admin Client and Securing Kafka
Lesson 01 – Admin Client11:59
Learning Objectives06:48
Demo: Perform Various Admin Tasks using Admin Client05:11
Lesson 02 – Kafka Security01:36
Kafka Security01:36
Lesson 03 – Kafka Security Components08:58
Kafka Security Components05:25
Lesson 04 – Configure SSL in Kafka01:50
Configure SSL in Kafka01:50
Lesson 05 – Secure using ACLs05:12
Secure using ACLs05:12
Lesson 06 – Key Takeaways00:22
Key Takeaways00:22

Course 5 Online Classroom Flexi Pass

MongoDB Developer and Administrator
More businesses are using MongoDB development services, the most popular NoSQL database, to handle their increasing data storage and handling demands. The MongoDB certification course equips you with the skills required to become a MongoDB Developer.

MongoDB Developer and Administrator
Lesson 0 – Course Introduction03:49
0.001 Course Introduction00:09
0.002 Table of Contents00:35
0.003 Objectives00:40
0.004 Course Overview00:54
0.005 Value to Professionals and Organizations00:59
0.006 Course Prerequisites00:17
0.007 Lessons Covered00:07
0.008 Conclusion00:08
Lesson 1 – Introduction to NoSQL databases31:44
1.001 Lesson1 NoSQL database introduction00:15
1.002 Objectives00:34
1.003 What is NoSQL01:01
1.004 What is NoSQL (contd.)00:27
1.005 Why NoSQL01:29
1.006 Difference Between RDBMS and NoSQL Databases02:22
1.007 Benefits of NoSQL04:41
1.008 Benefits of NoSQL (contd.)04:07
1.009 Types of NoSQL01:30
1.010 Key-Value Database01:31
1.011 Key-Value Database (contd.)01:28
1.012 Document Database00:51
1.013 Document Database Example00:55
1.014 Column-Based Database00:53
1.015 Column-Based Database (contd.)01:04
1.016 Column-Based Database (contd.)00:24
1.017 Column-Based Database Example00:21
1.018 Graph Database01:11
1.019 Graph Database (contd.)01:12
1.020 CAP Theorem00:28
1.021 CAP Theorem (contd.)01:04
1.022 Consistency00:49
1.023 Availability00:26
1.024 Partition Tolerance00:58
1.025 Mongo DB as Per CAP00:49
1.26 Quiz
1.027 Summary00:44
1.028 Conclusion00:10
Lesson 2 – MongoDB A Database for the Modern Web49:33
2.001 Lesson 2 MongoDB-A Database for the Modern Web00:19
2.002 Objectives00:41
2.003 What is MongoDB01:11
2.004 JSON00:50
2.005 JSON Structure01:22
2.006 BSON01:27
2.007 MongoDB Structure01:25
2.008 Document Store Example00:33
2.009 MongoDB as a Document Database01:34
2.010 Transaction Management in MongoDB00:33
2.011 Easy Scaling00:52
2.012 Scaling Up vs. Scaling Out00:49
2.013 Vertical Scaling00:50
2.014 Horizontal Scaling01:30
2.015 Features of MongoDB01:42
2.016 Secondary Indexes00:40
2.017 Replication00:57
2.018 Replication (contd.)00:21
2.019 Memory Management00:43
2.020 Replica Set00:48
2.021 Auto Sharding00:57
2.022 Aggregation and MapReduce01:11
2.023 Collection and Database01:01
2.024 Schema Design and Modeling00:46
2.025 Reference Data Model01:17
2.026 Reference Data Model Example00:44
2.027 Embedded Data Model01:09
2.028 Embedded Data Model Example00:29
2.029 Data Types01:25
2.030 Data Types (contd.)02:03
2.031 Data Types (contd.)01:02
2.032 Core Servers of MongoDB01:27
2.033 MongoDB s Tools02:57
2.034 Installing MongoDB on Linux00:05
2.035 Installing MongoDB on Linux03:01
2.036 Installing MongoDB on Windows00:06
2.037 Installing MongoDB on Windows01:22
2.038 Starting MongoDB On Linux00:05
2.039 Starting MongoDB On Linux01:25
2.040 Starting MongoDB On Windows00:04
2.041 Starting MongoDB On Windows01:38
2.042 Use Cases02:40
2.043 Use Cases (contd.)02:28
2.44 Quiz
2.045 Summary00:52
2.046 Conclusion00:12
Lesson 3 – CRUD Operations in MongoDB01:03:47
3.001 Lesson 3 CRUD Operations in MongoDB00:22
3.002 Objectives01:07
3.003 Data Modification in MongoDB00:52
3.004 Batch Insert in MongoDB01:45
3.005 Ordered Bulk Insert01:49
3.006 Performing Ordered Bulk Insert00:06
3.007 Performing Ordered Bulk Insert01:57
3.008 Unordered Bulk Insert01:08
3.009 Performing Un-ordered Bulk Insert00:06
3.010 Performing Un-ordered Bulk Insert01:50
3.011 Inserts Internals and Implications01:13
3.012 Performing an Insert Operation00:06
3.013 Performing an Insert Operation01:51
3.014 Retrieving the documents00:47
3.015 Specify Equality Condition01:32
3.016 Retrieving Documents by Find Query00:07
3.017 Retrieving Documents by Find Query01:12
3.018 dollar in, AND Conditions01:36
3.019 dollar or Operator00:46
3.020 Specify AND or OR Conditions01:05
3.021 Retrieving Documents by Using FindOne, AND or OR Conditions00:09
3.022 Retrieving Documents by Using FindOne, AND or OR Conditions01:49
3.023Regular Expression00:47
3.024 Array Exact Match00:45
3.025 Array Projection Operators00:48
3.026 Retrieving Documents for Array Fields00:05
3.027 Retrieving Documents for Array Fields01:52
3.028 dollar Where Query01:13
3.029 Cursor01:51
3.030 Cursor (contd.)01:49
3.031 Cursor (contd.)01:49
3.032 Retrieving Documents Using Cursor00:06
3.033 Retrieving Documents Using Cursor02:48
3.034 Pagination00:48
3.035 Pagination Avoiding Larger Skips00:49
3.036 Advance query option01:18
3.037 Update Operation01:02
3.038 Updating Documents in MongoDB00:06
3.039 Updating Documents in MongoDB01:23
3.040 dollar SET01:38
3.041 Updating Embedded Documents in MongoDB00:06
3.042 Updating Embedded Documents in MongoDB01:06
3.043 Updating Multiple Documents in MongoDB00:06
3.044 Updating Multiple Documents in MongoDB01:31
3.045 dollar Unset and dollar inc Modifiers01:02
3.046 Dollar inc modifier to increment and decrement00:07
3.047 Dollar inc modifier to increment and decrement02:35
3.048 Replacing Existing Document with New Document00:07
3.049 Replacing Existing Document with New Document01:14
3.050 dollar Push and dollar addToSet00:43
3.051 Positional Array Modification01:26
3.052 Adding Elements into Array Fields00:06
3.053 Adding Elements into Array Fields01:46
3.054 Adding Elements to Array Fields Using AddToSet00:06
3.055 Adding Elements to Array Fields Using AddToSet01:12
3.056 Performing AddToSet00:08
3.057 Performing AddToSet00:49
3.058 Upsert01:22
3.059 Removing Documents01:16
3.060 Performing Upsert and Remove Operation00:07
3.061 Performing Upsert and Remove Operation01:31
3.62 Quiz
3.063 Summary00:58
3.064 Conclusion00:11
Lesson 4 – Indexing and Aggregation01:14:15
4.001 Lesson 4 Indexing and Aggregation00:20
4.002 Objectives00:50
4.003 Introduction to Indexing01:08
4.004 Types of Index01:51
4.005 Properties of Index01:14
4.006 Single Field Index00:41
4.007 Single Field Index on Embedded Document00:37
4.008 Compound Indexes00:58
4.009 Index Prefixes01:02
4.010 Sort Order01:07
4.011 Ensure Indexes Fit RAM00:46
4.012 Multi-Key Indexes00:54
4.013 Compound Multi-Key Indexes00:44
4.014 Hashed Indexes01:01
4.015 TTL Indexes01:42
4.016 Unique Indexes01:17
4.017 Sparse Indexes01:23
4.018 Demo-Create Compound, Sparse, and Unique Indexes00:07
4.019 Demo-Create Compound, Sparse, and Unique Indexes01:52
4.020 Text Indexes01:20
4.021 Demo-Create Single Field and Text Index00:06
4.022 Demo-Create Single Field and Text Index02:29
4.023 Text Search01:30
4.024 Index Creation00:56
4.025 Index Creation (contd.)01:35
4.026 Index Creation on Replica Set01:34
4.027 Remove Indexes00:37
4.028 Modify Indexes00:53
4.029 Demo-Drop and Index from a Collection00:05
4.030 Demo-Drop and Index from a Collection01:29
4.031 Rebuild Indexes01:07
4.032 Listing Indexes00:37
4.033 Demo-Retrieve Indexes for a Collection and Database00:07
4.034 Demo-Retrieve Indexes for a Collection and Database01:24
4.035 Measure Index Use00:44
4.036 Demo-Use Mongo Shell Methods to Monitor Indexes00:07
4.037 Demo-Use Mongo Shell Methods to Monitor Indexes02:26
4.038 Control Index Use01:08
4.039 Demo-Use the Explain, Dollar Hint and Dollar Natural Operators to Create Index00:08
4.040 Demo-Use the Explain, Dollar Hint and Dollar Natural Operators to Create Index01:59
4.041 Index Use Reporting01:50
4.042 Geospatial Index01:22
4.043 Demo-Create Geospatial Index00:06
4.044 Demo-Create Geospatial Index03:34
4.045 MongoDB s Geospatial Query Operators01:08
4.046 Demo-Use Geospatial Index in a Query00:07
4.047 Demo-Use Geospatial Index in a Query02:19
4.048 Dollar GeoWith Operator00:32
4.049 Proximity Queries in MongoDB00:46
4.050 Aggregation01:35
4.051 Aggregation (contd.)00:38
4.052 Pipeline Operators and Indexes01:03
4.053 Aggregate Pipeline Stages01:43
4.054 Aggregate Pipeline Stages (contd.)01:08
4.055 Aggregation Example01:17
4.056 Demo-Use Aggregate Function00:06
4.057 Demo-Use Aggregate Function01:37
4.058 MapReduce00:50
4.059 MapReduce (contd.)01:13
4.060 MapReduce (contd.)00:56
4.061 Demo-Use MapReduce in MongoDB00:06
4.062 Demo-Use MapReduce in MongoDB02:36
4.063 Aggregation Operations01:25
4.064 Demo-Use Distinct and Count Methods00:06
4.065 Demo-Use Distinct and Count Methods01:21
4.066 Aggregation Operations (contd.)00:39
4.067 Demo-Use the Group Function00:05
4.068 Demo-Use the Group Function00:54
4.69 Quiz
4.070 Summary01:07
4.071 Conclusion00:11
Lesson 5 – Replication and Sharding01:14:51
5.001 Replication and Sharding00:21
5.002 Objectives00:47
5.003 Introduction to Replication01:20
5.004 Master-Slave Replication00:40
5.005 Replica Set in MongoDB01:45
5.006 Replica Set in MongoDB (contd.)01:02
5.007 Automatic Failover00:54
5.008 Replica Set Members01:01
5.009 Priority 0 Replica Set Members01:11
5.010 Hidden Replica Set Members01:05
5.011 Delayed Replica Set Members01:07
5.012 Delayed Replica Set Members (contd.)00:56
5.013 Demo-Start a Replica Set00:05
5.014 Demo-Start a Replica Set02:40
5.015 Write Concern01:36
5.016 Write Concern (contd.)00:52
5.017 Write Concern Levels01:25
5.018 Write Concern for a Replica Set01:08
5.019 Modify Default Write Concern00:57
5.020 Read Preference01:03
5.021 Read Preference Modes01:03
5.022 Blocking for Replication01:29
5.023 Tag Set01:25
5.024 Configure Tag Sets for Replica set02:13
5.025 Replica Set Deployment Strategies01:54
5.026 Replica Set Deployment Strategies (contd.)01:40
5.027 Replica Set Deployment Patterns00:40
5.028 Oplog File01:29
5.029 Replication State and Local Database01:00
5.030 Replication Administration01:24
5.031 Demo-Check a Replica Set Status00:07
5.032 Demo-Check a Replica Set Status02:22
5.033 Sharding01:50
5.034 When to Use Sharding01:13
5.035 What is a Shard01:00
5.036 What is a Shard Key00:56
5.037 Choosing a Shard Key00:27
5.038 Ideal Shard Key01:29
5.039 Range-Based Shard Key01:20
5.040 Hash-Based Sharding00:57
5.041 Impact of Shard Keys on Cluster Operation01:48
5.042 Production Cluster Architecture01:42
5.043 Config Server Availability01:11
5.044 Production Cluster Deployment01:31
5.045 Deploy a Sharded Cluster01:34
5.046 Add Shards to a Cluster01:33
5.047 Demo-Create a Sharded Cluster00:06
5.048 Demo-Create a Sharded Cluster03:02
5.049 Enable Sharding for Database01:05
5.050 Enable Sharding for Collection00:52
5.051 Enable Sharding for Collection (contd.)00:36
5.052 Maintaining a Balanced Data Distribution00:34
5.053 Splitting00:40
5.054 Chunk Size01:30
5.055 Special Chunk Type00:57
5.056 Shard Balancing02:09
5.057 Shard Balancing (contd.)00:47
5.058 Customized Data Distribution with Tag Aware Sharding00:36
5.059 Tag Aware Sharding00:38
5.060 Add Shard Tags01:29
5.061 Remove Shard Tags01:12
5.62 Quiz
5.063 Summary01:16
5.064 Conclusion00:10
Lesson 6 – Developing Java and Node JS Application with MongoDB47:02
6.001 Developing Java and Node JS Application with MongoDB00:17
6.002 Objectives00:38
6.003 Capped Collection01:15
6.004 Capped Collection Creation00:57
6.005 Capped Collection Creation (contd.)00:53
6.006 Demo-Create a Capped Collection in MongoDB00:05
6.007 Demo-Create a Capped Collection in MongoDB01:55
6.008 Capped Collection Restriction01:11
6.009 TTL Collection Features00:57
6.010 Demo-Create TTL Indexes00:06
6.011 Demo-Create TTL Indexes02:14
6.012 GridFS01:03
6.013 GridFS Collection01:43
6.014 Demo-Create GridFS in MongoDB Java Application00:06
6.015 Demo-Create GridFS in MongoDB Java Application02:36
6.016 MongoDB Drivers and Client Libraries00:30
6.017 Develop Java Application with MongoDB00:56
6.018 Connecting to MonogDB from Java Program00:50
6.019 Create Collection From Java Program00:45
6.020 Insert Documents From Java Program00:39
6.021 Insert Documents Using Java Code Example00:42
6.022 Demo-Insert a Document Using Java00:04
6.023 Demo-Insert a Document Using Java02:41
6.024 Retrieve Documents Using Java Code00:29
6.025 Demo-Retrieve Document Using Java00:04
6.026 Demo-Retrieve Document Using Java01:38
6.027 Update Documents Using Java Code00:29
6.028 Demo-Update Document Using Java00:04
6.029 Demo-Update Document Using Java02:13
6.030 Delete Documents Using Java Code00:23
6.031 Demo-Delete Document Using Java00:05
6.032 Demo-Delete Document Using Java01:38
6.033 Store Images Using GridFS API00:56
6.034 Retrieve Images Using GridFS API00:35
6.035 Remove Image Using GridFS API00:14
6.036 Remove Image Using GridFS API (contd..)00:49
6.037 Connection Creation Using Node JS01:03
6.038 Insert Operations Using Node JS00:49
6.039 Insert Operations Using Node JS (contd.)01:02
6.040 Demo-Perform CRUD Operation in Node JS00:05
6.041 Demo-Perform CRUD Operation in Node JS02:29
6.042 Demo-Perform Insert and Retrieve Operations Using Node JS00:05
6.043 Demo-Perform Insert and Retrieve Operations Using Node JS01:07
6.044 Update Operations Using Node JS00:19
6.045 Retrieve Documents Using Node JS00:40
6.046 Using DB Cursor to Retrieve Documents00:26
6.047 Mongoose ODM Module in Node JS00:39
6.048 Defining Schema Using Mongoose00:50
6.049 Defining Schema Using Mongoose (contd.)00:58
6.050 Demo-Use Mongoose to Define Schema00:07
6.051 Demo-Use Mongoose to Define Schema01:27
6.052 Demo-How to Run Node JS Using Mongoose00:09
6.053 Demo-How to Run Node JS Using Mongoose00:55
6.54 Quiz
6.055 Summary01:03
6.056 Conclusion00:09
Lesson 7 – Administration of MongoDB Cluster Operations44:20
7.001 Administration of MongoDB Cluster Operations00:17
7.002 Objectives00:28
7.016 Memory-Mapped Files01:17
7.017 Journaling Mechanics01:25
7.018 Storage Engines00:35
7.019 MMAPv1 Storage Engine00:48
7.020 WiredTiger Storage Engine01:47
7.021 WiredTiger Compression Support00:56
7.022 Power of 2-Sized Allocations01:14
7.023 No Padding Allocation Strategy00:40
7.024 Diagnosing Performance Issues00:57
7.025 Diagnosing Performance Issues (contd.)02:07
7.026 Diagnosing Performance Issues (contd.)02:28
7.027 Demo-Monitor Performance in MongoDB00:05
7.028 Demo-Monitor Performance in MongoDB02:05
7.029 Optimization Strategies for MongoDB01:42
7.030 Configure Tag Sets for Replica Set00:49
7.031 Optimize Query Performance02:08
7.032 Monitoring Strategies for MongoDB00:49
7.033 MongoDB Utilities01:33
7.034 MongoDB Commands02:25
7.035 MongoDB Management service (MMS)00:26
7.036 Data Backup Strategies in MongoDB00:28
7.037 Copying Underlying Data Files01:59
7.038 Backup with MongoDump01:59
7.039 Fsync and Lock01:49
7.040 MongoDB Ops Manager Backup Software00:46
7.041 Security Strategies in MongoDB01:33
7.042 Authentication Implementation in MongoDB01:54
7.043 Authentication in a Replica set01:06
7.044 Authentication on Sharded Clusters01:30
7.045 Authorization02:11
7.046 End-to-End Auditing for Compliance00:46
7.47 Quiz
7.048 Summary01:08
7.049 Conclusion00:10

Course 6 Online Classroom Flexi Pass

AWS Big Data Certification Training
The AWS Big Data certification training prepares you for all aspects of hosting big data and performing distributed processing on the AWS platform and has been aligned to the AWS Certified Data Analytics – Specialty exam. This course is developed by industry leaders and aligned with the latest best practices.

Section 1 – Self-paced Curriculum
Lesson 1 Big Data on AWS Certification Course Overview07:01
Overview of Big Data on AWS Certification Course04:22
2.Course Introduction02:39
Lesson 2 – Big Data on AWS Introduction50:23
1. Learning Objective00:53
2.Cloud computing and it’s advantages03:29
3.Cloud Computing Models05:02
4.Cloud Service Categories04:18
5. AWS Cloud Platform03:54
6.Design Principles – Part One04:16
7. Design Principles – Part Two03:57
8.Why AWS for Big Data – Reasons and Challenges01:48
9.Databases in AWS06:47
10.Data Warehousing in AWS02:00
11.Redshift, Kinesis and EMR05:33
12.DynamoDB, Machine Learning and Lambda05:01
13.Elastic Search Services and EC202:51
14.Key Takeaways00:34
Lesson 3 – AWS Big Data Collection Services53:48
1.Learning Objective00:54
2.Amazon Kinesis and Kinesis Stream02:42
3.Kinesis Data Stream Architecture and Core Components04:01
4.Data Producer04:45
5.Data Consumer03:11
6.Kinesis Stream Emitting Data to AWS Services and Kinesis Connector Library04:30
7.Kinesis Firehose06:20
8.Transferring Data Using Lambda04:35
9.Amazon SQS, Lifecycle and Architecture06:25
10.IoT and Big Data03:25
11.IoT Framework04:21
12.AWS Data Pipelines and Data Nodes05:08
13.Activity, Pre-condition and Schedule02:47
14Key Takeaways00:44
Lesson 4 – AWS Big Data Storage Services36:51
1.Learning Objective00:33
2.Amazon Glacier and Big Data03:49
3.DynamoDB Introduction05:41
4.DynamoDB and EMR01:41
5.DynamoDB Partitions and Distributions04:08
6.DynamoDB GSI LSI02:55
7.DynamoDB Stream and Cross Region Replication04:06
8.DynamoDB Performance and Partition Key Selection03:24
9.Snowball and AWS Big Data01:14
10.AWS DMS01:23
11.AWS Aurora in Big Data03:24
12.Demo – Amazon Athena Interactive SQL Queries for Data in Amazon S3 – Part 203:59
13.Key Takeaways00:34
Lesson 5 – AWS Big Data Processing Services49:39
1.Learning Objective00:42
2.Amazon EMR03:41
3.Apache Hadoop04:10
4.EMR Architecture06:53
5.EMR Releases and Cluster02:12
6.Choosing Instance and Monitoring06:07
7.Demo – Advance EMR Setting Options03:30
8.Hive on EMR01:06
9.HBase with EMR05:50
10.Presto with EMR02:07
11.Spark with EMR06:12
12.EMR File Storage02:48
13.AWS Lambda03:49
14.Key Takeaways00:32
Lesson 6 – Analysis59:48
1.Learning Objective00:44
2.Redshift Intro and Use cases03:02
3.Redshift Architecture06:17
4.MPP and Redshift in AWS Eco-System05:13
5.Columnar Databases03:37
6.Redshift Table Design – Part 205:06
7.Demo – Redshift Maintenance and Operations01:44
8.Machine Learning Introduction04:02
9.Machine Learning Algorithm04:09
10.Amazon SageMaker00:57
11.Amazon Elasticsearch04:34
12.Amazon Elasticsearch Services05:22
13.Demo – Loading Dataset into Elasticsearch01:18
14.Logstash and R Studio01:51
15.Demo – Fetching the File and Analyzing it using RStudio03:48
17.Demo – Running Query on S3 using the Serverless Athena04:57
18.Key Takeaways00:26
Lesson 7 – Visualization15:36
1.Learning Objective00:37
2. Introduction to Amazon QuickSight04:57
3.Visual Types03:48
5.Big Data Visualization03:33
6.Key Takeaways00:26
Lesson 8 – Security21:11
1.Learning Objective00:40
2.EMR Security and Security Group02:19
3.Roles and Private Subnet02:27
4.Encryption at Rest and In-transit03:09
5.Redshift Security05:33
6.Encryption at Rest using HSM02:39
7.Cloud HSM vs AWS KMS02:32
8.Limit Data Access01:24
9.Key Takeaways00:28

Section 2 – Live Virtual Class Curriculum
Lesson 01 – Course Introduction
Overview of AWS Certified Data Analytics – Speciality Course
Overview of the Certification
Overview of the Course
Project highlights
Course Completion Criteria
Lesson 02 AWS in Big Data Introduction
Introduction to Cloud Computing
Cloud Computing Deployments Models
Types of Cloud Computing Services
AWS Fundamentals
AWS Cloud Economics
AWS Virtuous Cycle
AWS Cloud Architecture Design Principles
Why AWS for Big Data – Challenges
Databases in AWS
Relational vs Non Relational Databases
Data Warehousing in AWS
AWS Services for collecting, processing, storing, and analyzing big data
Key Takeaways
Deploy a Data Warehouse Using Amazon Redshift
Lesson 03 Collection
AWS Big Data Collection Services
Fundamentals of Amazon Kinesis
Loading Data into Kinesis Stream
Assisted Practice: Loading Data into Amazon Storage
Kinesis Data Stream High-Level Architecture
Kinesis Stream Core Concepts
AWS Services and Amazon Kinesis Data Stream
How to Put Data into Kinesis Stream?
Kinesis Connector Library
Amazon Kinesis Data Firehose
Assisted Practice: Transfer Data into Delivery Stream using Firehose
Assisted Practice: Transfer VPC Flow log to Splunk using Firehose
Data Transfer using AWS Lambda
Assisted Practice: Backing up data in Amazon S3 using AWS Lambda
Amazon SQS
IoT and Big Data
Amazon IoT Greengrass
AWS Data Pipeline
Components of Data Pipeline
Assisted Practice: Export MySQL Data to Amazon S3 Using AWS Data Pipeline
Key Takeaways
Streaming Data with Kinesis Data Analytics
Lesson 04 Storage
AWS Bigdata Storage services
Data lakes and Analytics
Data Management
Data Life Cycle
Fundamentals of Amazon Glacier
Glacier and Big Data
DynamoDB Introduction
DynamoDB: Core Components
Assisted Practice: Perform operations on DynamoDB table
DynamoDB in AWS Eco-System
DynamoDB Partitions
Data Distribution
DynamoDB GSI and LSI
DynamoDB Streams
Use cases: Capturing Table Activity with DynamoDB Streams
Cross-Region Replication
Assisted Practice: Create a Global Table using DynamoDB
DynamoDB Performance: Deep Dive
Partition Key Selection
Snowball & AWS BigData
Assisted Practice: Data Migration using AWS Snowball
AWS Aurora in BigData
Assisted Practice: Create and Modify Aurora DB Cluster
Storing and Retrieving the Data from DynamoDB
Lesson 05 Processing I
AWS Bigdata Processing Services
Overview of Amazon Elastic MapReduce (EMR)
EMR Cluster Architecture
Apache Hadoop
Apache Hadoop Architecture
Storage Options
EMR Operations
AWS Cluster
Assisted Practice: Create a cluster in S3
Assisted Practice: Monitor a Cluster in S3
Using Hue with EMR
Assisted Practice: Launch HUE Web Interface on Amazon EMR
Setup Hue for LDAP
Assisted Practice: Configure HUE for LDAP Users
Hive on EMR
Assisted Practice: Set Up a Hive Table to Run Hive Commands
Key Takeaways
Lesson 06: Processing II
Using HBase with EMR
HBase Architecture
Assisted Practice: Create a cluster with HBase
HBase and EMRFS
Presto with EMR
Presto Architecture
Fundamentals of Apache Spark
Apache Spark Architecture
Assisted Practice: Create a cluster with Spark
Apache Spark Integration with EMR
Fundamentals of EMR File System
Amazon Simple Workflow
AWS Lambda in Big Data Ecosystem
AWS Lambda and Kinesis Stream
AWS Lambda and RedShift
Key Takeaways
Real-Time Application with Apache Spark and AWS EMR
Lesson 07 ETL with Redshift
Introduction to AWS Bigdata Analysis Services
Fundamentals of Amazon Redshift
Amazon RedShift Architecture
Assisted Practice: Launch a Cluster, Load Dataset, and Execute Queries
RedShift in the AWS Ecosystem
Columnar Databases
Assisted Practice: Monitor RedShift Maintenance and Operations
RedShift Table Design
Choosing the Distribution Style
Redshift Data types
RedShift Data Loading
COPY Command for Data Loading
RedShift Loading Data
Key Takeaways
Lesson 08: Analysis with Machine Learning
Fundamentals of Machine Learning
Workflow of Amazon Machine Learning
Use cases
Machine learning Algorithms
Amazon SageMaker
Machine learning with Amazon Sagemaker
Assisted Practice: Build, Train, and Deploy a Machine Learning Model
Amazon Elasticsearch Service
Zone Awareness
Assisted Practice: Fetch the File and Run Analysis using RStudio
Amazon Athena
Assisted Practice: Execute Interactive SQL Queries in Athena
AWS Glue
Key Takeaways
Fraud Detection Using Classification Algorithms on AWS Sagemaker
Lesson 09 Analysis and Visualization
Introduction to AWS Bigdata Visualization Services
Amazon QuickSight
Amazon QuickSight – Workflow and Use Cases
Assisted Practice: Analyze the marketing campaign
Working with data
Assisted Practice: Analyze the marketing campaign using data from Amazon S3
Assisted Practice: Analyze the marketing campaign using data from Presto
Amazon QuickSight: Visualization
Assisted Practice: Create Visuals
Amazon QuickSight: Stories
Assisted Practice: Create a Storyboard
Amazon QuickSight: Dashboard
Assisted Practice: Create a Dashboard
Data Visualization: Other Tools
Assisted Practice: Create a Dashboard on Kibana
Key Takeaways
Exploratory Data Analysis Using AWS QuickSight
Lesson 10: Security
Introduction to AWS Bigdata Security
EMR Security
EMR Security: Best Practices
Fundamentals of Redshift Security
Data Protection and Encryption
Master Key, Encryption, and Decryption Process
Amazon Redshift Database Encryption
Key Management Services(KMS) Overview
Encryption using Hardware Security Modules
STS and Cross Account Access
Cloud Trail
Key Takeaways
Practice Projects
Practice Projects
Real-time Analytics on Streaming Data
Truegate S3 Replication Big Data Assignment

Course 7Online Classroom Flexi Pass

Big Data Capstone
Simplilearn’s Big Data Capstone project will give you an opportunity to implement the skills you learned in the Big Data Engineer master’s program. With dedicated mentoring sessions, you’ll know how to solve a real industry-aligned problem.The project is the final step in the learning path and will help you to showcase your expertise to employers.

Data Engineer Capstone

Lesson 01: Data Engineer Capstone
Data Engineer Capstone


  • Vast selection of courses and labs Access
  • Unlimited access from all devices
  • Learn from industry expert instructors
  • Assessment quizzes and monitor progress
  • Vast selection of courses and labs Access
  • Blended Learning with Virtual Classes
  • Access to new courses every quarter
  • 100% satisfaction guarantee

You Will Get Certification After Completetion This Course.

Instructor Led Lectures
All IT Tutor Pro Formerly It Nuggets Courses replicate a live class experience with an instructor on screen delivering the course’s theories and concepts.These lectures are pre-recorded and available to the user 24/7. They can be repeated, rewound, fast forwarded.
Visual Demonstrations, Educational Games & Flashcards
IT Tutor Pro Formerly It Nuggets recognizes that all students do not learn alike and different delivery mediums are needed in order to achieve success for a large student base. With that in mind, we delivery our content in a variety of different ways to ensure that students stay engaged and productive throughout their courses.
Mobile Optimization & Progress Tracking
Our courses are optimized for all mobile devices allowing students to learn on the go whenever they have free time. Students can access their courses from anywhere and their progress is completely tracked and recorded.
Practice Quizzes And Exams
IT Tutor Pro Formerly It Nuggets Online’s custom practice exams prepare you for your exams differently and more effectively than the traditional exam preps on the market. Students will have practice quizzes after each module to ensure you are confident on the topic you are learning.
World Class Learning Management System
IT Tutor Pro Formerly It Nuggets provides the next generation learning management system (LMS). An experience that combines the feature set of traditional Learning Management Systems with advanced functionality designed to make learning management easy and online learning engaging from the user’s perspective.

Frequently Asked Questions

How does online education work on a day-to-day basis?
Instructional methods, course requirements, and learning technologies can vary significantly from one online program to the next, but the vast bulk of them use a learning management system (LMS) to deliver lectures and materials, monitor student progress, assess comprehension, and accept student work. LMS providers design these platforms to accommodate a multitude of instructor needs and preferences.
Is online education as effective as face-to-face instruction?
Online education may seem relatively new, but years of research suggests it can be just as effective as traditional coursework, and often more so. According to a U.S. Department of Education analysis of more than 1,000 learning studies, online students tend to outperform classroom-based students across most disciplines and demographics. Another major review published the same year found that online students had the advantage 70 percent of the time, a gap authors projected would only widen as programs and technologies evolve.
Do employers accept online degrees?
All new learning innovations are met with some degree of scrutiny, but skepticism subsides as methods become more mainstream. Such is the case for online learning. Studies indicate employers who are familiar with online degrees tend to view them more favorably, and more employers are acquainted with them than ever before. The majority of colleges now offer online degrees, including most public, not-for-profit, and Ivy League universities. Online learning is also increasingly prevalent in the workplace as more companies invest in web-based employee training and development programs.
Is online education more conducive to cheating?
The concern that online students cheat more than traditional students is perhaps misplaced. When researchers at Marshall University conducted a study to measure the prevalence of cheating in online and classroom-based courses, they concluded, “Somewhat surprisingly, the results showed higher rates of academic dishonesty in live courses.” The authors suggest the social familiarity of students in a classroom setting may lessen their sense of moral obligation.
How do I know if online education is right for me?
Choosing the right course takes time and careful research no matter how one intends to study. Learning styles, goals, and programs always vary, but students considering online courses must consider technical skills, ability to self-motivate, and other factors specific to the medium. Online course demos and trials can also be helpful.
What technical skills do online students need?
Our platform typically designed to be as user-friendly as possible: intuitive controls, clear instructions, and tutorials guide students through new tasks. However, students still need basic computer skills to access and navigate these programs. These skills include: using a keyboard and a mouse; running computer programs; using the Internet; sending and receiving email; using word processing programs; and using forums and other collaborative tools. Most online programs publish such requirements on their websites. If not, an admissions adviser can help.