Learn with Lantana: R data transformations for SAS programmers
In this Learn with Lantana blog post, we’ve compiled a list of common data wrangling tasks and provide R code examples to get the task done. This blog post is geared towards those with SAS programming experience and uses the tidyverse suite of R packages.
Summary of data transformation examples in R.
1. Filtering rows in a dataset
Limit a dataset to a subset of rows that meet a condition.
SAS: Use a WHERE clause in a DATA step or PROC to limit the rows that are read in from the dataset. Alternatively, you could use an IF statement in a DATA step to allow all rows of the dataset to be read in, but then to filter out rows before saving the new dataset.
R: use dplyr::filter
2. Selecting columns
Use “select” when you want to simplify a dataset and only save the columns that you need.
SAS: use KEEP or DROP in a DATA step to indicate fields to keep or drop
R: use dplyr::select
3. Creating cut points (binning, discretizing)
Create categories from a continuous variable.
SAS: Use PROC FORMAT for user defined cut points, PROC RANK for equal sized groups like quintiles, and PROC HPBIN for equal-width bins.
R: Use dplyr::mutate. Note: various options for cut points are available.
3.1 User Defined Cut Point
3.2 Equal Size Bins
3.3 Equal-range Bins
4. Log Transformation
Transform a continuous variable to a natural log distribution.
SAS: use LOG() in a DATA step
R: use dyplr::mutate with log()
Recommended by LinkedIn
5. Standardizing (z-score)
Standardize a variable by calculating a z-score.
SAS: use PROC STANDARD
R: use dyplr::mutate with scale()
6. Min/Max Scaling (normalization)
Scale a continuous variable using normalization.
SAS: PROC STDIZE with method=range for min/max scaling.
R: use dyplr::mutate with a formula to normalize (min/max scaling).
7. Transposing Data Wide To Long
Convert wide data to long.
SAS: Use PROC TRANSPOSE
R: Use tidyr::pivot_longer to convert data from wide format to long format, turning multiple columns into key-value pairs
8. Long to Wide (pivot_wider)
Convert data from long to wide.
SAS: use PROC TRANSPOSE
R: use tidyr::pivot_wider to convert data from long format to wide.
9. Summarize/Aggregate
Summarize data to get descriptive statistics from a dataset by group.
SAS: Use PROC MEANS, add CLASS if grouping.
R: Use dplyr::summarize for statistical descriptions of a dataset and add dplyr::group_by to do this by group.
Overall
By Group
Great tutorial!