Learn with Lantana: R data transformations for SAS programmers

In this Learn with Lantana blog post, we’ve compiled a list of common data wrangling tasks and provide R code examples to get the task done. This blog post is geared towards those with SAS programming experience and uses the tidyverse suite of R packages.

Summary of data transformation examples in R.

Article content
The following examples use the base R sample dataset "iris" that is built-in to all installations of R.

1. Filtering rows in a dataset

Limit a dataset to a subset of rows that meet a condition.

SAS: Use a WHERE clause in a DATA step or PROC to limit the rows that are read in from the dataset. Alternatively, you could use an IF statement in a DATA step to allow all rows of the dataset to be read in, but then to filter out rows before saving the new dataset.

R: use dplyr::filter

Article content

2. Selecting columns

Use “select” when you want to simplify a dataset and only save the columns that you need.

SAS: use KEEP or DROP in a DATA step to indicate fields to keep or drop

R: use dplyr::select

Article content

3. Creating cut points (binning, discretizing)

Create categories from a continuous variable.

SAS: Use PROC FORMAT for user defined cut points, PROC RANK for equal sized groups like quintiles, and PROC HPBIN for equal-width bins.

R: Use dplyr::mutate. Note: various options for cut points are available.

3.1 User Defined Cut Point

Article content
Article content

3.2 Equal Size Bins

Article content
Article content

3.3 Equal-range Bins

Article content
Article content

 4. Log Transformation

Transform a continuous variable to a natural log distribution.

SAS: use LOG() in a DATA step

R: use dyplr::mutate with log()

Article content
Article content

5. Standardizing (z-score)

Standardize a variable by calculating a z-score.

SAS: use PROC STANDARD

R: use dyplr::mutate with scale()

Article content
Article content

6. Min/Max Scaling (normalization)

Scale a continuous variable using normalization.

SAS: PROC STDIZE with method=range for min/max scaling.

R: use dyplr::mutate with a formula to normalize (min/max scaling).

Article content
Article content

7. Transposing Data Wide To Long

Convert wide data to long.

SAS: Use PROC TRANSPOSE

R: Use tidyr::pivot_longer to convert data from wide format to long format, turning multiple columns into key-value pairs

Article content

8. Long to Wide (pivot_wider)

Convert data from long to wide.

SAS: use PROC TRANSPOSE

R: use tidyr::pivot_wider to convert data from long format to wide.

Article content

9. Summarize/Aggregate

Summarize data to get descriptive statistics from a dataset by group.

SAS: Use PROC MEANS, add CLASS if grouping.

R: Use dplyr::summarize for statistical descriptions of a dataset and add dplyr::group_by to do this by group.

Overall

Article content

By Group

Article content
Article content


To view or add a comment, sign in

More articles by Lantana Consulting Group

Others also viewed

Explore content categories