SQL Data Scientist Interview (Part 2)
In the last article we went over SELF JOINS:
Apart from them, there was another type of rather interesting and recurring question I came across in interviews involving window and lead/lag functions. The questions went something like this:
Coming to the solution, essentially we need to create 2 more columns from the data above after which getting to the final answer will be very simple. The first column is the previous transaction date and second one is the the difference in days b/w current and previous transaction:
We can add the first column with the help of this code:
SELECT *
,LAG(transaction_date) OVER (PARTITION BY user_id ORDER BY transaction_date) AS
last_transaction_date
FROM T
To get the second column an additional line of code is needed:
WITH data_with_last_transaction_date AS (
SELECT *
,LAG(transaction_date) OVER (PARTITION BY user_id ORDER BY transaction_date) AS last_transaction_date
)
SELECT *
,DATEDIFF('day',last_transaction_date,transaction_date) AS
days_between_this_and_last_transaction
FROM data_with_last_transaction_date
If you have reached this stage in your solution, all the heavy lifting is already done and you are close to acing this. A simple AVG and GROUP BY will take us over the line:
WITH T_last_trans_date AS (
SELECT
,LAG(transaction_date) OVER (PARTITION BY user_id ORDER BY transaction_date) AS last_transaction_date
,DATEDIFF('day',last_transaction_date,transaction_date) AS days_between_this_and_last_transaction
FROM T
),
T_last_trans_date_diff AS (
SELECT
,LAG(transaction_date) OVER (PARTITION BY user_id ORDER BY transaction_date) AS last_transaction_date
,DATEDIFF('day',last_transaction_date,transaction_date) AS days_between_this_and_last_transaction
FROM T_last_trans_date
)
SELECT user_id, AVG(days_between_this_and_last_transaction) AS average_days_between_transactions
FROM T_last_trans_date_diff
GROUP BY 1
Another variation of this question: find users who had transactions on three successive days.
Thank you for posting this problem and solution showing window and LEAD/LAG functions.