Discover Classes. Earn Rewards.

Big Data with Amazon Cloud, Hadoop/Spark and Docker is unfortunately unavailable

Thankfully we have 13 other Big Data Classes for you to choose from. Check our top choices below or see all classes for more options.

Data Analytics Technologies Bootcamp

NYC Career Centers - Virtually Online

Master the top data analytics tools and gain actionable insights through hands-on projects. Unlock your potential as a data analyst with this comprehensive bootcamp.

(679) All levels 18 and older
$1,949

8 sessions

Gift it!

Python for Data Science Bootcamp

Noble Desktop - Virtually Online

Uncover the power of Python for analyzing real-world data sets in this hands-on course at Noble Desktop. Explore Python fundamentals and learn how to create programs, work with data, visualize insights, and develop machine learning models. Elevate your data science skills with the Python for Data Science Bootcamp.

(372) All levels 18 and older
$1,495

5 sessions

Gift it!

Python Data Science & Machine Learning Bootcamp

Noble Desktop - Virtually Online

Learn how to apply Python to analyze data, create predictive models using machine learning, and automate tasks in this comprehensive data science course. Gain the necessary programming skills to excel in entry-level data science and Python engineering positions.

(372) Beginner 18 and older
$3,495

15 sessions

Gift it!

Tableau Bootcamp

Noble Desktop - Virtually Online

Learn to transform raw data into informative visuals with Tableau, the industry standard for creating charts, graphs, and maps. Master the art of data visualization and gain control over the look and feel of your creations, allowing you to present data in a visually stunning and meaningful way. Elevate your data analysis skills today!

(372) All levels 18 and older
$499

2 sessions

Gift it!

Data Analytics Technologies Bootcamp

Noble Desktop - Virtually Online

In this course, students will master Excel, SQL, and Tableau, some of the top data analytics tools. Here, students will gain the skills to organize, analyze, summarize, and visualize data, presenting actionable insights for effective decision-making. Comprehensive classroom training in Midtown Manhattan.

(372) All levels 18 and older
$1,949

8 sessions

Gift it!
See all Big Data classes Online

Big Data with Amazon Cloud, Hadoop/Spark and Docker

  • Beginner
  • 18 and older
  • $2,840.50
  • NYC Data science Academy
  • 30 hours over 12 sessions

Start Dates (0)

  • $2,840.50
  • NYC Data science Academy
  • 30 hours over 12 sessions
Showing 10 of 0

Class Description

Description

What you'll learn in this big data training:

This is a 6-week evening program providing a hands-on introduction to the Hadoop and Spark ecosystem of Big Data technologies. The course will cover these key components of Apache Hadoop: HDFS, MapReduce with streaming, Hive, and Spark. Programming will be done in Python. The course will begin with a review of Python concepts needed for our examples. The course format is interactive. Students will need to bring laptops to class. We will do our work on AWS (Amazon Web Services); instructions will be provided ahead of time on how to connect to AWS and obtain an account.

What is Hadoop?

Hadoop is a set of open-source programs running in computer clusters that simplify the handling of large amounts of data. Originally, Hadoop consisted of a distributed file system tuned for large data sets and an implementation of the MapReduce parallelism paradigm, but has expanded in many ways. It now includes database systems, languages for parallelism, libraries for machine learning, its own job scheduler, and much more. Furthermore, MapReduce is no longer the only parallelism framework; Spark is an increasingly popular alternative. In summary, Hadoop is a very popular and rapidly growing set of cluster computing solutions, which is becoming an essential tool for data scientists.

Prerequisites

To get the most out of the class, you need to be familiar with Linux file systems, Linux command line interface (CLI) and the basic linux commands such as cd, ls, cp, etc. You also need to have basic programming skills in Python, and are comfortable with functional programming style, for example, how to use map() function to split a list of strings into a nested list. Object oriented programming (OOP) in python is not required.

Syllabus

Unit 1: Introduction to Hadoop

1. Data Engineering Toolkits

  • Running Linux using Docker containers
  • Linux CLI command and bash scripts
  • Python basics

2. Hadoop and MapReduce

  • Big Data Overview
  • HDFS
  • YARN
  • MapReduce

Unit 2 – MapReduce

3. MapReduce using MRJob 1

  • Protocols for Input & Output
  • Filtering

4. MapReduce using MRJob 2

  • Top n
  • Inverted Index
  • Multi-step Jobs

Unit 3 – Apache Hive

5. Apache Hive 1

  • Databases for Big Data
  • HiveQL and Querying Data
  • Windowing And Analytics Functions
  • MapReduce Scripts

6. Apache Hive 2

  • Tables in Hive
  • Managed Tables and External Tables
  • Storage Formats
  • Partitions and Buckets

Unit 4 – Apache Pig

7. Apache Pig 1

  • Overview
  • Pig Latin: Data Types
  • Pig Latin: Relational Operators

8. Apache Pig 2

  • More Pig Latin: Relational operators
  • More Pig Latin: Functions
  • Compiling Pig to MapReduce
  • The Parallel Clause
  • Join Optimizations

Unit 5 – Apache Spark and AWS

9. Apache Spark – Spark Core

  • Spark Overview
  • Running Spark using Databricks Notebooks
  • Working with PySpark: RDDs
  • Transformations and Actions

10. Apache Spark – Spark SQL

  • Spark DataFrame
  • SQL Operations using Spark SQL

11. Apache Spark – Spark ML

  • ML Pipeline using PySpark

12. Amazon Elastic MapReduce

  • Overview
  • Amazon Web Services: IAM, EC2, S3
  • Creating EMR Cluster
  • Submitting Jobs
  • Intro to AWS CLI

Project: Data Engineering Project

Remote Learning

This course is available for "remote" learning and will be available to anyone with access to an internet device with a microphone (this includes most models of computers, tablets). Classes will take place with a "Live" instructor at the date/times listed below.

Upon registration, the instructor will send along additional information about how to log-on and participate in the class.

School Notes: We offer a certification licensed by the NYS Board of Education.

Refund Policy

Any student wishing to withdraw from a program must notify CourseHorse in writing. The date of withdrawal for refund purposes is the last date of physical attendance. The failure of a student to notify us in writing of withdrawal may delay refund of tuition due pursuant to Section 5001 and 5002 of the Education Law.

Any student requesting cancellation within seven days after signing the Enrollment Agreement but before instruction begins will be refunded all money paid less 5% cancellation fee. Thereafter, in the event of cancellation or termination by the school, refunds will be prorated based on the student's last date of attendance.

Reviews of Classes at NYC Data Science Academy (31)

(4.4-star rating across 31 reviews)
See reviews for other classes at NYC Data Science Academy
loading...
Hide Reviews

Review Summary by CourseHorse

Students who attended classes at NYC Data Science Academy on Big Data with Hadoop/Spark and Docker found the courses to be well-organized and informative. The curriculum covered a range of topics, including R and Python programming, statistical analysis, machine learning, and big data tools like Hadoop and Spark. The breadth of topics and difficulty level were suitable for those with academic backgrounds, and the courses prepared them well for further exploration in data science. The bootcamp was described as intense, both physically and mentally, but students found support from instructors, TAs, and classmates. The job assistance provided by the academy was also praised, with services including networking, resume editing, and mock interviews. Overall, students felt that attending NYC Data Science Academy was a valuable decision that provided them with the skills and confidence to pursue careers in data science. Quotes: 1. "The bootcamp courses weren't supposed to teach you everything, but they did prepare me very well if I wanted to explore further data science topics." 2. "I was able to see people applying data science tools to their expertise brilliantly, fashion, marketing, IT, health care... It was very helpful for me who was looking to step outside of academia." 3. "I appreciate the knowledge, skills, and support I acquired from NYC Data Science Academy. I highly recommend NYC Data Science Academy to anyone interested in this career."

Similar Classes

Benefits of Booking Through CourseHorse

  • Booking is safe. When you book with us your details are protected by a secure connection.
  • Lowest price guaranteed. Classes on CourseHorse are never marked up.
  • This class will earn you 28405 points. Points give you money off your next class!

Questions & Answers (0)

Get quick answers from CourseHorse and past students.

NYC Data Science Academy

NYC Data Science Academy is a program designed to teach those who wish to learn.

Through hands-on projects and real-world applications, our students develop the skills they will need to pursue data science as both a hobby and profession. We also organize the NYC Open Data Meetup, which means that by...

Read more about NYC Data Science Academy

CourseHorse Approved

This school has been carefully vetted by CourseHorse and is a verified Online educator.

NYC Data Science Academy

Give This Course as a Gift Card

  • Thousands of classes
  • No expiration
  • Unique and memorable gifts for any occasion
  • Personalized
  • Explore a passion, gain a new skill, discover a new hobby, engage in a memorable experience
  • Instant delivery
  • Lock in a price with the Inflation Buster Gift Card Price Adjuster™

Buy a Gift Card

Book this Class as a Group Event

Booking this class for a group? Find great private group events

Or see all Professional Group Events

Explore group events and team building activities ranging from cooking, art, escape rooms, trivia, and more.

CourseHorse Gift Cards

  • Creative & unique gift for any occasion
  • Thousands of classes & experiences
  • No expiration date
  • Instant e-delivery (or choose a date)
  • Add a personalized message
  • Lock in a price with the Inflation Buster Gift Card Price Adjuster™
Buy a Gift Card
gift card with the CourseHorse logo gift card with the CourseHorse logo
Loading...