From the course: Learning BigQuery

Introducing BigQuery - BigQuery Tutorial

From the course: Learning BigQuery

Introducing BigQuery

- [Instructor] Hello. And welcome to this course on Learning BigQuery. BigQuery is a cloud-hosted data warehouse on the Google Cloud Platform. And in this course you will be introduced to its features and use cases. This first section serves as a quick overview of this technology. And before we dive in, here are some of the course prerequisites. I assume that you already have working knowledge of SQL and the Google Cloud Platform. We will be using both of these significantly in the demos of this course. It will also greatly help if you are clear on some of the fundamentals of big data since BigQuery is essentially a big data technology. Towards the end of this course we will make use of a BigQuery command-line utility, so some prior experience working with a shell will also greatly help. With that said let's answer a basic question, what exactly is BigQuery? The official documentation refers to it as a serverless, highly scalable, and cost-effective multi-cloud data warehouse which is specifically designed for business agility. The key words in this definition are serverless, which means that users can make use of BigQuery without having to provision any servers on their own. BigQuery will automatically scale to handle up to petabytes of data, so you never really have to worry about the size of the data you're working with. Being inexpensive and having the ability to cross cloud platforms is also a strong point. And BigQuery is essentially a data warehouse. Let's quickly remind ourselves then what a data warehouse is just to put things into context. This is a system for reporting as well as data analysis, and warehouses are known for their ability to handle very large volumes of data that have been compiled from many disparate sources. The primary goal of using a data warehouse is to extract meaningful insights that can drive your business decisions, and BigQuery is more than capable of fulfilling this role. Data warehouses have been around for decades, so you might pose the question why should I use BigQuery over all of the others? Well, for one, this is a serverless platform. This in fact means that servers are running in the background but a user is entirely abstracted from this. And significantly, they don't have to worry about the overhead of managing servers. BigQuery is also highly available. You don't have to worry about servers going down since this is taken care of by the service. We already touched upon the scalability of BigQuery. This includes the autoscaling of clusters based on demand for the data. And the scaling is capable of coping with petabytes of data. These are features not available in most traditional warehouses. As at many other warehouses, BigQuery is capable of working with many different data sources. You can pull in data from your own file system, from Google Cloud Storage, or even from Amazon's S3 buckets. You can then query that data using either standard SQL or even legacy SQL if you really need to. The performance in any case is excellent. Query results are usually cached for 24 hours so that subsequent runs of that query will only need to fetch data from the cache rather than from disk. And significantly, the total cost of ownership for BigQuery is relatively low compared to many of the equivalent offerings on other cloud platforms. Speaking of which, here are some of the alternatives to Google's BigQuery. On Amazon Web Services there is Redshift, there is SQL Data Warehouse on the Azure Cloud, and then there is also the Snowflake platform.

Contents