From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Data pipelines overview

Data pipelines overview

- [Instructor] The AWS Data Engineer Associate exam is primarily testing on whether you know how to implement, optimize, and operate a data pipeline on AWS. To answer any of these questions, we need to start with the basics and define some of the terminology. A data pipeline consists of a series of steps to ingest, store, and prepare data for analysis. Data is collected by appropriate tools from multiple sources, like applications or IOT devices. The raw data can be stored in a data lake as is, or it can be first transformed before storing. This transformation could include sorting, reformatting, or combining the data with other sources, so it can be better utilized by data analysts and by visualization solutions. Data pipelines that collect, transform and then store the data are called ETL pipelines, which stands for extract, transform, and load. However, with the advent of inexpensive cloud storage, many enterprises are storing the raw data as is and then transforming it later for…

Contents