From the course: Data Engineering on AWS: Data Cataloging, Processing, Analytics, and Visualization

Data engineering pipeline overview

- [Instructor] As we have learned in part one that data engineering sits at the heart of data lake and comprise of various stages starting from data sourcing until visualization. In part one of this course, we discussed about data sources where we learn data sourcing from various data sources. For example, there can be data coming from connected services like IoT devices, it can be a web block, or it can be data coming from other on-premise application or database. Then we discussed about various ingestion patterns, where we firstly discussed hybrid scenarios where we have certain technology's tech already running on-premise and now organization want to go onto the cloud to leverage scalability and resiliency. There we discussed the usage of Direct Connect and Storage Gateway to implement the same. Then we talked about one-time data ingestion approach, that is migration, where we discussed about Snowmobile and Data Migration Service. And finally we discussed realtime ingestion patterns where we primarily learned about the usage of Kinesis. Then we moved on to the storage layer where we discussed about various storage options for different needs, and discussed S3 in detail. Now, in this course, that is in part two of data engineering, we are going to focus on S3 data cataloging, where we will learn about AWS Glue in depth. Then we will learn to process this data using Lambda, EMR, and Lake Formation, followed by analyzing the process data using Athena, Kinesis Analytics, OpenSearch, and Redshift. And we are done with our analytics. And now it's time to visualize this data using QuickSight, which enables you to make more efficient business decisions. So let's get started on this interesting and insightful learning journey.

Contents