🎈

EMR & Glue

Owner
Verification
Tags
Last edited time

EMR

Run Hadoop clusters on AWS (Big Data)

clusters are made of 100’s of EC2’s with auto scaling

works within a VPC on a single AZ - can export data to S3 for better performance

Every cluster consists of a Master, Core, and optional Task nodes, each with a specific role.

👋🏻
Cost optimization: for core nodes, use On-Demand EC2 instances. For the task nodes, use Spot EC2 instances

Glue

managed ETL to prepare data for analytics - or to create a data catalog using extracted metadata