From the course: Data Management with Apache NiFi (2023)
Data management with Apache NiFi - Apache NiFi Tutorial
From the course: Data Management with Apache NiFi (2023)
Data management with Apache NiFi
- [Instructor] Hi, and welcome to this course on data management with Apache NiFi. Now, Apache NiFi is simple enough that you learn by doing. You just need a quick overview of what NiFi is all about and then you can dive right into the demos and get started. So what exactly is Apache NiFi? It's a data flow system based on the concepts of flow-based programming. NiFi is used to automate and manage the flow of information between systems. Apache NiFi is an open source data integration tool that allows users to efficiently move, transform, and process data from different sources routed to multiple destinations and perform data integration tasks. NiFi is built on the fundamental concepts of flow-based programming. Flow-based programming refers to a programming paradigm where tasks are defined as networks. The application or your task is essentially a network of black box processes, and these processes exchange data across predefined connections by passing messages and the connections are specified externally to the processes themselves. Flow-based programming represents your application as a directed graph with nodes that are interconnected using edges. Now, let's talk about data flow automation and management because this is extremely important in enterprises. Enterprises are constantly producing and storing data in all kinds of different systems and you always need to move data between systems. Now, this data may need to be transformed and cleaned while you are moving it from the source system to the destination system. And this is what Apache NiFi helps you automate. It automates data flows between source and destination systems. Now, data flows in NiFi are represented as directed graphs in a web-based user interface. So you basically write little to no code when you're using Apache NiFi. Even without writing any code, NiFi is powerful enough to support data routing, transformation, and all of the system mediation logic that you require. Now, why is it a challenge to build such an automated data flow management system? Well anytime you're dealing with systems of different kinds, all systems are prone to failure and these failures are often unexpected. Also, when you're managing data flows, data may be produced at a different rate and maybe consumed at a different rate and this is hard to manage. Also, data is inherently unpredictable. Data may change constantly. The schema of the data may also change. All of these have to be managed by our data flow system. Also, the data in the real world is noisy, corrupt, unclean with missing values and a whole bunch of other problems. When you are moving data from one system to another, you have to take into account the fact that these systems evolve at different rates. One system might be updated faster, another may be updated much more slowly. Also, anytime you are moving data, you need to ensure that your data flows are compliant and secure. As you might imagine, automated data flow management is a huge topic and we won't be able to cover everything that Apache NiFi has to offer. Here's what we'll cover in this course. We'll see how we can build, connect up, and configure a simple data flow using NiFi. We'll run data transformation operations using SQL queries. We'll see how we can integrate NiFi with PostgreSQL, HTTP, and Amazon S3. We'll understand how concurrency, scheduling, queues, and back pressure work in Apache NiFi. And finally, we'll see how we can monitor data flows and configure email alerts in NiFi.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.