From the course: Databricks for Data Analysts
Opening a Databricks Notebook: What is different? - Databricks Tutorial
From the course: Databricks for Data Analysts
Opening a Databricks Notebook: What is different?
Okay friends, now we are inside Databricks And if it's your first time, I know that might be a little bit overwhelming with all features and stuff that you see on your screen But this is totally okay because we're gonna learn everything step by step through the course Now if you have a look to the left side, you're gonna find a lot of features and tools inside Databricks And that's because we can do many things with it So we can do data engineering, data analytics, data science, AI engineering Of course, since we are talking about data analysts I'm going to show you now what are the most relevant features for you. So now the first thing that we have to look at is the catalog. So if you go over here, we call this the unity catalog. And this is where your data lives. Of course, at the start, everything going to be empty because we didn't create anything yet. Now, the second place that you can look at is the workspace. As in data, unless you're going to be creating few things like an SQL query, maybe a dashboard or all your work and your project can be stored inside the workspace. Again, everything here is empty. Now, the other things that you can see here, like the jobs and pipelines, the compute, those stuff actually for data engineering. So we can skip them for now. And here comes the most important section that is relevant for us for analytics. It is the SQL section. And as you can see, we have a lot of things underneath it. So we have an editor, we have queries, dashboards, the genie can make alerts, we can analyze the queries. And as well, we have our compute, the SQL warehouse. So this is the processing power that we have in Databricks. And of course, without it, we cannot run anything. So we need something to compute our commands, right? Now, of course, if you are using the free edition, we can use only one warehouse, which is more than enough to practice Databricks. But if you get the premium one like I have here, you can create more warehouses and you can configure it the way that you need. And it is made very simple by Databricks. So if I go over here, all what I have to do is just give it a name and as well, define the size of the compute. So you can see, we have X, small, small, medium, large. It is very simple. I'm going to stick with the smallest one, but if things are very slow, then I'm going to go to the next level. Of course, for the free edition, you're going to stick with whatever Databricks prepared for you. For now, I will not create anything. I have one. Now, if you look to this, you can see the status is actually offline or stopped. In order to start it, It's very simple, we have here the play button. So we're gonna say, start the warehouse. It usually take like one, two seconds and then Databricks is gonna spin up a server warehouse for us to execute our queries. So the warehouse is online and it is ready. And that's it actually, this is very simple. Now, of course, in order to use it and to do any analytical things using all those SQL features, first we need data. Without data, we cannot do anything, right? We have to fix this. So now what we're gonna do, going to go to the unity catalog over here now datapricks organized the data inside hierarchies it is very similar to classical databases so the highest level is going to be my organization this is the company where inside it you're going to find multiple catalogs as a default you're going to get the workspace and the system now if you expand the workspace catalog you're going to find here schemas the schema is just a logical way in order to organize multiple objects in one place And here we have the default and the information schema. The information schema is just an internal schema for Databricks in order to store like logs, usages and some other technical metadata information. Now of course if you go inside the schemas at the start it's gonna be empty so we don't have any data inside it. And now inside it we can put our data into like different types of objects like a table, a view or we can put it in volumes. So here of course we have to make the first decision whether we're going to put our data inside that default or we're going to go and create our own schema. And it is very simple. So if you click on the workspace, you have here on the right side, something called create schema. So click on it. And now all what you have to do is to just give it a name. So I'm going to call it the sales data and then leave everything as a default. And you can put here like description about this schema schema for sales data that contains customers, orders, and products. It is optional, of course, it's not a must, but now we're going to go and create it. So with that, we have our third schema. Of course, if you don't see it yet, you can go and refresh the catalog and then you should be able to see it. And with that, we have now an empty schema in order to put our own data inside it so that later we analyze it. So actually now we are ready to start importing our data inside Databricks.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.