BigQuery and DataLab
Photo by Ján Jakub Naništa on Unsplash

BigQuery and DataLab

We can work directly with BigQuery from Jupyter notebook in DataLab. Can do the pip install as !pip install google-cloud-bigquery and we are good to go.

In the Jupyter notebook cell we can run the SQL query as:

%%bigquery --project $PROJECT
SELECT fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality
FROM `<project>.winequality.winequality`
ORDER BY quality DESC        

The project is the current GCP project we are working. The table winequality contains the Wine quality data from Kaggle. This table is within the dataset of the same name. Once the query completes the output is displayed similar to pandas dataframe.

No alt text provided for this image

We can run a parameterized query as:

PARAMS = {"quality_val": 4}

%%bigquery --project $PROJECT --params $PARAMS
SELECT 
fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality
FROM `<project>.winequality.winequality`
WHERE quality = @quality_val
LIMIT 10        

During the query runtime the value 4 is passed to the parameter quality_val and the query result comprise of data where quality = 4

No alt text provided for this image

We can directly load the query result into pandas dataframe with " %%bigquery df"

%%bigquery df --project $PROJECT
SELECT 
fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality
FROM `<project>.winequality.winequality`
ORDER BY quality DESC

df.describe()        

The describe function of the pandas dataframe return the mean,max,min etc for all the numerical columns.

No alt text provided for this image

The boxplot of the columns density and quality as

df.boxplot('density','quality')        
No alt text provided for this image


To view or add a comment, sign in

More articles by Ruma Sinha

  • BigqueryML and Logistic Regression

    In this article will explore the Logistic Regression with SQL in BigQueryML, using the Heart dataset from Kaggle. In…

  • Loading data with csv file in Google Bigquery

    In Google Cloud BigQuery we can load data from the csv file. We can create a dataset under the Google Project.

Explore content categories