SQL Queries in Corresponding R Languages
Since I use SQL and R interchangeably at work, I wanted to showcase the beauty of both. Below is a list of "commands" that can be run by both languages.
Table Overview
We will be using this dataset called "df".
1. View Top n=5 Rows
# MySQL
SELECT *
FROM df
LIMIT 5
# R Language
head(df, n=5)
In R, we can also use "tail(df, n=5)" to get the last 5 rows in the data.
2. Select Rows with Conditions
# MySQL
SELECT *
FROM df
WHERE revenue > 10
# R Language
rev_greater_10 <- df[which(df$revenue > 10 ), ]
3. Select Rows with Multiple Conditions
# MySQL
SELECT *
FROM df
WHERE revenue BETWEEN 10 AND 15
# R Language
df[which(df$revenue > 10 & df$revenue <15 ), ]
4. Select Rows with Conditions (Sort by)
# MySQL
SELECT *
FROM df
WHERE revenue > 10
ORDER BY revenue DESC
# R Language
rev_greater_10 <- df[which(df$revenue > 10 ), ]
rev_greater_10[order(-rev_greater_10$revenue), ] # Descending
5. Select a Subset of Columns by Names
# MySQL
SELECT company, revenue
FROM df
# R Language
df[ ,c("company", "revenue")]
6. Group By (Average)
Recommended by LinkedIn
# MySQL
SELECT company, AVG(revenue)
FROM df
GROUP BY company
# R Language
aggregate(revenue ~ company, data = df, FUN = mean)
There are many other ways to aggregate data such as COUNT(), SUM(), etc. In R, we replace the "FUN = " to "length" (for COUNT), "sum" (for SUM).
7. Join Tables
Table 1: df
Table 2: df2
After two tables have left joined ^
# MySQL
SELECT *
FROM df2
LEFT JOIN df ON df2.company = df.company
# R Language
merge(x=df2, y=df, by = "company", all.x = TRUE) # Left Join
merge(x=df2, y=df, by = "company", all.y = TRUE) # Right Join
merge(x=df2, y=df, by = "company", all = TRUE) # Outer Join
8. Subsetting Data using Wildcards
# MySQL
SELECT *
FROM df
WHERE company LIKE 't%' ## Starts with t
# R Language
df[which(startsWith(df$company, "t")), ]
There are many variations of using wildcards. This is an overview from W3school.com
For R, we can use Regex to perform more complicated string matching tasks. The link below is a cheat sheet for using regular expressions. "Stringr" R package is another useful tool as well.
9. Finding Unique variable
# MySQL
SELECT DISTINCT company
FROM df
# R Language
unique(df$company)
Thanks for reading this!
Amazing!
Love this