From the course: Data Quality Testing with Great Expectations
Creating a data asset - Great Expectations Tutorial
From the course: Data Quality Testing with Great Expectations
Creating a data asset
A data source in GX gives us the connection to a database, but now we also need to specify which tables we want to access. In order to access specific tables through GX, we need to create a data asset. Data assets are collections of records within a data source. There are two different types of data assets in a SQL data source. In the case of a SQL database, a data asset can consist of the records from a specific table called a table data asset, or the records from a SQL query called a query data asset. In our case, we want to create a table data asset for the full January taxi data table. We've already defined a data source, so now we need three more parameters to create a data asset. First, the table name, which is the actual name of the SQL table in the Postgres database that we want to access. Second, the name of the database schema that we've loaded the tables to. And third, the asset name, which is the name that we want to use to reference the table data asset within GX. Let's get back to our Jupyter Notebook to create a table data asset for our taxi data. Remember that we've already created a data source earlier. We can add a new data asset to the data context like this. This statement creates a new data asset called TaxiGen2025Table based on the yellow trip data 2025-01 table in our Postgres database in the taxi data schema. Let's execute this. We can verify that the asset was correctly added to the data context by printing the list of all assets. And you should see the Taxi Jan 2025 table in here in that list. Okay, we now have a data source that connects to the Postgres database and a data asset that references the January Taxi data table. Let's move on to the last step and create a batch definition.