SQL Data Scientist Interview (Part 2)

SQL Data Scientist Interview (Part 2)

In the last article we went over SELF JOINS:

Apart from them, there was another type of rather interesting and recurring question I came across in interviews involving window and lead/lag functions. The questions went something like this:

No alt text provided for this image

Coming to the solution, essentially we need to create 2 more columns from the data above after which getting to the final answer will be very simple. The first column is the previous transaction date and second one is the the difference in days b/w current and previous transaction:

No alt text provided for this image

We can add the first column with the help of this code:

SELECT *

,LAG(transaction_date) OVER (PARTITION BY user_id ORDER BY transaction_date) AS 
last_transaction_date

FROM T



        

To get the second column an additional line of code is needed:

WITH data_with_last_transaction_date AS (

SELECT *

,LAG(transaction_date) OVER (PARTITION BY user_id ORDER BY transaction_date) AS last_transaction_date

)

SELECT *

,DATEDIFF('day',last_transaction_date,transaction_date) AS
days_between_this_and_last_transaction

FROM data_with_last_transaction_date 

        

If you have reached this stage in your solution, all the heavy lifting is already done and you are close to acing this. A simple AVG and GROUP BY will take us over the line:

WITH T_last_trans_date AS (

SELECT 

,LAG(transaction_date) OVER (PARTITION BY user_id ORDER BY transaction_date) AS last_transaction_date

,DATEDIFF('day',last_transaction_date,transaction_date) AS days_between_this_and_last_transaction

FROM T

),
T_last_trans_date_diff AS (

SELECT 

,LAG(transaction_date) OVER (PARTITION BY user_id ORDER BY transaction_date) AS last_transaction_date

,DATEDIFF('day',last_transaction_date,transaction_date) AS days_between_this_and_last_transaction

FROM T_last_trans_date 

)





SELECT user_id, AVG(days_between_this_and_last_transaction) AS average_days_between_transactions

FROM T_last_trans_date_diff 

GROUP BY 1        
No alt text provided for this image

Another variation of this question: find users who had transactions on three successive days.

Thank you for posting this problem and solution showing window and LEAD/LAG functions.

Like
Reply

To view or add a comment, sign in

More articles by Mehdi Mujtaba

  • LinkedIn Recruiter Spam

    One somewhat frustrating thing about this platform is getting messages in Focused Inbox for contract roles despite…

    1 Comment
  • Creating Business Value as a Product Data Scientist

    While a Machine Learning Data Scientist might be able to create measurable business value by, say, improving product…

  • Djokovic's Tiebreak 'Lockdown Mode' - By the Numbers

    Delving deeper into Djokovic's recent French Open win, one stat was absolutely mind-blowing: not only did Djokovic win…

  • SQL Data Scientist Interview (Part 1)

    SQL is the the most fundamental skill for a data analyst/data scientist-analytics/insights/product (whatever you want…

    1 Comment
  • Automating (aspects of) Job Search - LinkedIn Scraping

    In my last article I (automatically) went through my Gmail Inbox to find recruiter email addresses and send…

  • Automating (aspects of) Job Search

    After 2 years at Quotient Technology (formerly Coupons.com) I was recently back on the job market: One aspect of this…

  • Federer vs. Djokovic : A data-driven Analysis

    Since 2011 Djokovic has been the dominant player on the Mens’ Tour, which is an understatement. Not even the foremost…

  • Spend Tracking and Analysis

    For the first few years after starting a job I barely kept track of spending. There were a couple of half-hearted…

    1 Comment
  • San Jose to Honululu Airfare Analysis

    One of the perks of working on the West Coast is closer access to places like Hawaii and Alaska. Having already…

    2 Comments
  • Data Science Resources for Beginners

    In my coursework and projects I have had to search for many resources (practice datasets, intuitive explanations of…

    3 Comments

Explore content categories