SQL Queries in Corresponding R Languages

SQL Queries in Corresponding R Languages

Since I use SQL and R interchangeably at work, I wanted to showcase the beauty of both. Below is a list of "commands" that can be run by both languages.

Table Overview

No alt text provided for this image

We will be using this dataset called "df".

1. View Top n=5 Rows

No alt text provided for this image
# MySQL

SELECT *
FROM df
LIMIT 5

# R Language

head(df, n=5)
        

In R, we can also use "tail(df, n=5)" to get the last 5 rows in the data.

2. Select Rows with Conditions

No alt text provided for this image
# MySQL

SELECT *
FROM df
WHERE revenue > 10

# R Language

rev_greater_10 <- df[which(df$revenue > 10 ), ]        

3. Select Rows with Multiple Conditions

No alt text provided for this image
# MySQL

SELECT *
FROM df
WHERE revenue BETWEEN 10 AND 15

# R Language

df[which(df$revenue > 10 & df$revenue <15 ), ]
        

4. Select Rows with Conditions (Sort by)

No alt text provided for this image
# MySQL

SELECT *
FROM df
WHERE revenue > 10
ORDER BY revenue DESC

# R Language

rev_greater_10 <- df[which(df$revenue > 10 ), ]
rev_greater_10[order(-rev_greater_10$revenue), ]   # Descending
        

5. Select a Subset of Columns by Names

No alt text provided for this image
# MySQL

SELECT company, revenue
FROM df

# R Language

df[ ,c("company", "revenue")]
        

6. Group By (Average)

No alt text provided for this image
# MySQL

SELECT company, AVG(revenue) 
FROM df
GROUP BY company

# R Language

aggregate(revenue ~ company, data = df, FUN = mean)
        

There are many other ways to aggregate data such as COUNT(), SUM(), etc. In R, we replace the "FUN = " to "length" (for COUNT), "sum" (for SUM).

7. Join Tables

No alt text provided for this image

Table 1: df

No alt text provided for this image

Table 2: df2

No alt text provided for this image

After two tables have left joined ^

# MySQL

SELECT *
FROM df2
LEFT JOIN df ON df2.company = df.company 

# R Language

merge(x=df2, y=df, by = "company", all.x = TRUE)   # Left Join
merge(x=df2, y=df, by = "company", all.y = TRUE)   # Right Join
merge(x=df2, y=df, by = "company", all = TRUE)   # Outer Join
        

8. Subsetting Data using Wildcards

No alt text provided for this image
# MySQL

 
SELECT *
FROM df
WHERE company LIKE 't%'   ## Starts with t

# R Language

df[which(startsWith(df$company, "t")), ]        

There are many variations of using wildcards. This is an overview from W3school.com

No alt text provided for this image

For R, we can use Regex to perform more complicated string matching tasks. The link below is a cheat sheet for using regular expressions. "Stringr" R package is another useful tool as well.

9. Finding Unique variable

No alt text provided for this image
# MySQL

SELECT DISTINCT company
FROM df


# R Language

unique(df$company)        

Thanks for reading this!

To view or add a comment, sign in

Others also viewed

Explore content categories