LeetCode Curated SQL Solutions and Discussion — Week 1

Ankur Ranjan
6 min readMar 21, 2022

--

SQL is a must when you are in the domain of Data. Let it be Data Engineering, Big Data, Data Analyst or BI Developer, everyone who is working with Data should have a good understanding of SQL. I feel that reading SQL theoretically is not gonna help that much. So I have come up with the idea of solving and discussing some good problems of SQL with wonderful communities of Data Engineers, Big Data Developers, Data Analysts, BI Developers. I will be posting some of the good questions of SQL per week. I do like Leetcode for practice and this platform has created a very curated list of SQL questions. I am also a youtube creator and I will be posting the video link of all these questions too.

Let’s try with a very simple table and try to solve some of the SQL questions which can be formed from this table.

Let’s suppose we are having one table named Activity which will have the following column and properties.

Table: Activity

+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) is the primary key of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.


Input:Activity table:
-----------+-----------+------------+--------------+
player_id | device_id | event_date | games_played |
-----------+-----------+------------+--------------+
1 | 2 | 2016-03-01 | 5 |
1 | 2 | 2016-05-02 | 6 |
2 | 3 | 2017-06-25 | 1 |
3 | 1 | 2016-03-02 | 0 |
3 | 4 | 2018-07-03 | 5 |
-----------+-----------+------------+--------------+

Leetcode has multiple SQL questions w.r.t. this table and they are as follows.

  1. Write an SQL query to report the first login date for each player. [Easy]
  2. Write an SQL query to report the device that is first logged in for each player. [Easy]
  3. Write an SQL query to report for each player and date, how many games played so far by the player. That is the total number of games played by the player until that date. [Medium]
  4. Write an SQL query to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players. [Medium]

Let’s start solving the very first question i.e. to find the first login date for each player. We want output like this.

Output: 
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
+-----------+-------------+

It is a very easy problem to solve. We can just group by on player_id and find the minimum of event_date. Let’s try to write the solution then.

select 
player_id,
min(event_date) as first_login
from activity
group by player_id

Let’s go to the second question now i.e. to report the device that is first logged in for each player. So the output will be the following.

Output: 
+-----------+-----------+
| player_id | device_id |
+-----------+-----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
+-----------+-----------+

This is also a very easy problem to solve. Here we will use the windows function and intermediate table to solve this problem. First of all, we will create one new column named `rank_col` using the dense_rank windows function. As we have to find a device that is first logged in for each player, we have to partition on player_id and then we have to order by event_date in ascending order. Once this is done then we can save these results using the `with` statement and create one intermediate table and after that, we can just filter all those results whose `rank_col` column has a value equal to one. Let’s look into SQL queries for better understanding.

with rank_table as(
select
player_id,
device_id,
dense_rank() over(
partition by player_id
order by event_date asc
) as rank_col
from activity
)

select player_id, device_id from rank_table where rank_col = 1

Let’s try to solve the third question now i.e. to report for each player and date, how many games played so far by the player. Here basically, we have to find the total number of games played by the player until that date. So the final output will be like this.

Output: 
+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1 | 2016-03-01 | 5 |
| 1 | 2016-05-02 | 11 |
| 1 | 2017-06-25 | 12 |
| 3 | 2016-03-02 | 0 |
| 3 | 2018-07-03 | 5 |
+-----------+------------+---------------------+
Explanation:
For the player with id 1, 5 + 6 = 11 games played by 2016-05-02, and 5 + 6 + 1 = 12 games played by 2017-06-25.
For the player with id 3, 0 + 5 = 5 games played by 2018-07-03.
Note that for each player we only care about the days when the player logged in.

We can easily solve this problem by using windows. We will apply sum on games_played column and we will sum i.e. by applying partition on player_id and order by event_date ascending. Let’s look into the query for a better understanding.

select 
player_id,
event_date,
sum(games_played) over(
partition by player_id order by event_date asc
) as games_played_so_far
from activity

Now let’s move to the fourth question i.e. to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. Here the final output will be as follows.

Output: 
+-----------+
| fraction |
+-----------+
| 0.33 |
+-----------+

Explanation:
Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33

Here the first focus of our solution should be finding the difference between the first login of players to their event date. We can easily achieve it by applying the `datadiff(date_1, date_2)` function, where the date_1 will be event_date and date_2, will be first_login which we can calculate by applying the `min()` function on the event_date column and applying partition on player_id. We can store the above calculation using an intimidatory table. After that, we can simply find the fraction by counting the distinct player from the original table i.e. activity table and counting the distinct players from the intimidatory table where date_diff is equal to 1. Let’s look into SQL queries for better understanding.

with activity_stats as (
select
player_id,
event_date,
datediff(
event_date,
min(event_date) over(partition by player_id)
) as date_diff
from activity
)


select
round(
count(distinct player_id) /
(select count(distinct player_id) from activity)
, 2) as fraction
from activity_stats
where date_diff = 1

One can find links to all these solutions on my youtube channel. Please like and subscribe to my channel, if you have learned to form it. Please hit the bell icon to never miss the update from my The Big Data Show Youtube channel. One can take the ready create and insert script from the description box of my Youtube video.

Originally published at https://www.linkedin.com.

--

--

Ankur Ranjan
Ankur Ranjan

Written by Ankur Ranjan

Data Engineer III @Walmart | Contributor of The Big Data Show

No responses yet