From the course: Data-Centric Visual AI
The feedback loop
From the course: Data-Centric Visual AI
The feedback loop
- [Instructor] In the last module, we highlighted the shift from model centric to data-centric ai, but what are the benefits? In this lesson, we'll show how data-centric AI creates a feedback loop into your ML pipeline that allows for rapid improvements to your whole stack and large amounts of flexibility. Let's look at that model life cycle again. Pause the video for a second and try this. What can go wrong in the cycle and where are our most crucial areas to get right? You might be tempted to say model training, but data shows that training a model is much more dependent on the data going in than the hyper parameter tuning. Let's start with data curation. Remember, models are replaceable and for good reason, models are getting better each year across the board. In the event that you found a critical vulnerability or a new stunning breakthrough, the speed at which you pivot is what's going to be paramount. Let's look at some examples. One of the easiest ways that you can use data curation in the model feedback loop to get insights into your data sets on where to improve is by training quick and then pivoting quick, such example is about finding mistakes using your model predictions. Training a model for even a short amount of time can give you insight into where the annotations are missing or poor in your dataset. In this example, we're going to do a high confidence false positive to find where our model was highly confident but still incorrect. In this case, we can see that it's correctly getting this broccoli here. However, our ground truths have a different understanding of what broccoli is. In this case, we probably want to clean up our broccoli or make sure that is, at least in our next training run our model and our ground truths are on the same page of what a broccoli looks like. Similarly, in our example with the cat, we can see that although we have many ground truths in this, our model finds a cup that was not previously labeled. We can add this cup into our dataset to make sure that our model knows that it did correctly define the cup, and that is not a misclassification. These are just some examples about how you can get insights with a really quick model train to be able to work on this lifecycle and work on the cycle of training to curation to improve your dataset and ultimately prove your model's performance. Also, when we think about evaluation, we want to check all of our edge cases. We can do this powerfully by using things like embeddings to find all the different outliers in our dataset, understand the distributions, and see where our model is highly confident or not so confident. We can take example of these dogs that are moving quick. Our model is easily able to find all these cases of moving dogs quick, some dogs that maybe look like giraffes and sometimes, and to find exactly what our model is thinking even in these outlier cases. We'll be talking in future modules, how to find and evaluate all these specific edge cases. But what if a new model has just been released? If you're checking benchmarks and it looks like this new model architecture might be actually better overall than you can quickly pivot to be able to compare new models to your old evaluations to always be on top of the game. Once again, the speed at which you can pivot from an old model architecture to a new model architecture is pivotal to increasing your chances of always having the best model architecture out in production. By having the best model architecture in production, that's how you're going to get a leg up on your competition. To reiterate, data-centric AI is all about putting data at the center of your stack. By putting data in your center of your stack and investing in your data sets, you'll improve the quality, save time, and just get to production faster. Long passed are the days of building models yourselves, diving into individual layers to figure out how you can squeeze out a little bit more accuracy and dealing with all the time wrangling your data around and all the errors those models might throw are way past gone. Now by putting data at the center of your stack using state-of-the-art models and using the best open source tools available, you can get to production and get to production fast.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.