From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

State management and replayability

State management and replayability

- [Presenter] An important characteristic of a data pipeline is how it can handle and recover from failures. Amazon CTO, Werner Vogels, famously said, "Everything fails all the time." No computer system can operate flawlessly without errors, so you have to design strategies for handling failures into your pipeline. In this lesson, we're going to cover two of these, maintaining state and replayability. A data pipeline is a type of workflow where data passes from one stage to the next. For operational efficiency, you want to be able to resume the workflow from the last successful state rather than have to start over if there is an error. Workflow management systems are used to coordinate the work between various services, and they keep track of success and failures, and implement branching logic to handle any errors. Many serverless data collection and processing services store checkpoints in an external repository in order to maintain state. For example, a stream processor keeps track…

Contents