SQL Data Scientist Interview (Part 1)

SQL Data Scientist Interview (Part 1)

SQL is the the most fundamental skill for a data analyst/data scientist-analytics/insights/product (whatever you want to call it). It is how you talk to the data. Not surprisingly, it is also an essential part of the interview process. Actually, it is a part of the tech screen and without that it is not possible to get to an onsite, where there is an additional SQL round.

As someone who used to to give interviews for fun, there were a couple of interesting SQL questions I got wrong and they exposed me to some recurring concepts. I was able to use these failures not only for my future interviews but I think it also helped me in my job.

I feel a lot of people currently interviewing would also benefit from these concepts as they are tested in almost every advanced SQL interview.

The first concept is a SELF JOIN and I will explain with an example:

No alt text provided for this image

The way to approach it would be to break it down and bring everything to user level.

First find users who bought coffee:

SELECT DISTINCT user_id
FROM T
WHERE LOWER(category)='coffee'        

This will leave us with this dataset:

No alt text provided for this image

Then find users who bought Starbucks:

SELECT DISTINCT user_id
FROM T
WHERE LOWER(brand)='starbucks'
        

This will leave us with this dataset:

No alt text provided for this image

Combine these 2 pieces of information with a JOIN on user_id, essentially a join to self aka SELF JOIN:

WITH coffee_buyers AS
(
SELECT DISTINCT user_id
FROM T
WHERE LOWER(category)='coffee'
),
starbucks_buyers AS 
(
SELECT DISTINCT user_id
FROM T
WHERE LOWER(brand)='starbucks'
)

SELECT DISTINCT coffee_buyers.user_id
FROM coffee_buyers 
LEFT JOIN starbucks_buyers 
ON coffee_buyers.user_id=starbucks_buyers.user_id        

This will leave us with this dataset:

No alt text provided for this image

Since we want users who bought coffee but did NOT buy Starbucks, we can filter for users with NULL for starbucks_buyers.user_id. That can be done by adding this line to code above:

WHERE starbucks_buyers.user_id IS NULL        

That brings us to the end of this question.

There are other variations of this question one can come across:

  • Find users who had a transaction in Sep '22 but not in May '22
  • Find users who worked at Wayne Enterprises before working at Stark Industries

In the next part we will look at lead/lag and window functions.

Interesting problem and neat solution. Thank you for posting this.

To view or add a comment, sign in

More articles by Mehdi Mujtaba

  • LinkedIn Recruiter Spam

    One somewhat frustrating thing about this platform is getting messages in Focused Inbox for contract roles despite…

    1 Comment
  • Creating Business Value as a Product Data Scientist

    While a Machine Learning Data Scientist might be able to create measurable business value by, say, improving product…

  • Djokovic's Tiebreak 'Lockdown Mode' - By the Numbers

    Delving deeper into Djokovic's recent French Open win, one stat was absolutely mind-blowing: not only did Djokovic win…

  • SQL Data Scientist Interview (Part 2)

    In the last article we went over SELF JOINS: Apart from them, there was another type of rather interesting and…

    1 Comment
  • Automating (aspects of) Job Search - LinkedIn Scraping

    In my last article I (automatically) went through my Gmail Inbox to find recruiter email addresses and send…

  • Automating (aspects of) Job Search

    After 2 years at Quotient Technology (formerly Coupons.com) I was recently back on the job market: One aspect of this…

  • Federer vs. Djokovic : A data-driven Analysis

    Since 2011 Djokovic has been the dominant player on the Mens’ Tour, which is an understatement. Not even the foremost…

  • Spend Tracking and Analysis

    For the first few years after starting a job I barely kept track of spending. There were a couple of half-hearted…

    1 Comment
  • San Jose to Honululu Airfare Analysis

    One of the perks of working on the West Coast is closer access to places like Hawaii and Alaska. Having already…

    2 Comments
  • Data Science Resources for Beginners

    In my coursework and projects I have had to search for many resources (practice datasets, intuitive explanations of…

    3 Comments

Explore content categories