Performing Conditional Processing
This article describes the tools that allow programs to “make decisions” based on data values. For example, you may want to read a value of Age and create a variable that represents age groups. You may want to determine if values for a particular variable are within predefined limits. Programs that perform any of these operations require conditional processing—the ability to make logical decisions based on data values.
The IF and ELSE IF Statements
Two of the basic tools for conditional processing are the IF and ELSE IF statements. To understand how these statements work, suppose you have collected data on a group of students based on the following variables:
The first task is to create a new variable that represents age groups. To do this we run the following code:
libname daily "/home/u41108928/Ron Cody Data";
data daily.conditional;
length Gender $ 1
Quiz $ 2;
input Age Gender Midterm Quiz FinalExam;
if missing(Age) then AgeGroup = .;
else if Age lt 20 then AgeGroup = 1;
else if Age lt 40 then AgeGroup = 2;
else if Age lt 60 then AgeGroup = 3;
else if Age ge 60 then AgeGroup = 4;
datalines;
21 M 80 B- 82
. F 90 A 93
35 M 87 B+ 85
48 F . . 76
59 F 95 A+ 97
15 M 88 . 93
67 F 97 A 91
. M 62 F 67
35 F 77 C- 77
49 M 59 C 81
;
title "Listing of CONDITIONAL";
proc print data= daily.conditional noobs;
run;
The output will be :
When you write a program like this, you need to make sure your logic is correct. For example, what if a person is 25 years old? The first IF statement is false because Age is not missing. The next ELSE IF statement is evaluated and found to be false as well. Finally, the third IF statement is evaluated and AgeGroup is set equal to 2. Because this IF statement is true, all the remaining ELSE IF statements are skipped.
If you are working with very large data sets and want to squeeze every last drop out of the efficiency tank, you should place the IF statements in order, from the ones most likely to have a true condition to the ones least likely to have a true condition. This increases efficiency because SAS skips testing all the ELSE conditions when a previous IF condition is true.
If you want to take a subset of data from a raw data file or an existing SAS data set, you can use a special form of an IF statement called a subsetting IF.
data daily.females;
set daily.conditional;
if Gender eq 'F';
run;
title "Listing of FEMALES";
proc print data=daily.females noobs;
run;
The IN Operator
If you want to test if a value is one of several possible choices, you can use multiple OR statements, like this:
if Quiz = 'A+' or Quiz = 'A' or Quiz = 'B+' or Quiz = 'B'
then QuizRange = 1;
else if Quiz = 'B-' or Quiz = 'C+' or Quiz = 'C'
then QuizRange = 2;
else if not missing(Quiz) then QuizRange = 3;
These statements can be simplified by using the IN operator, like this:
if Quiz in ('A+' 'A' 'B+' 'B') then QuizRange = 1;
else if Quiz in ('B-' 'C+' 'C') then QuizRange = 2;
else if not missing(Quiz) then QuizRange = 3;
The list of values in parentheses following the IN operator can be separated by blanks or commas. The first line could also be written like this:
if Quiz in ('A+','A','B+','B') then QuizRange = 1;
You can also use the IN operator with numeric variables. For example, if you had a numeric variable called Subject (stored as a numeric value) and you wanted to list observations for Subject numbers 10, 22, 25, and 33, the following WHERE statement could be used:
where Subject in (10 22 25 33);
As with the example using character values, you may separate the values with spaces or commas. You can also specify a range of numeric values, using a colon to separate values in the list. For example, to list observations where Subject is a 10, 22–25, or 30, you can write:
where Subject in (10,22:25,30);
Using a SELECT Statement for Logical Tests
A SELECT statement provides an alternative to a series of IF and ELSE IF statements. Here is one way to use a SELECT statement:
select (AgeGroup);
when (1) Limit = 110;
when (2) Limit = 120;
when (3) Limit = 130;
otherwise;
end;
The expression following the SELECT statement is referred to as a select-expression; the expression following a WHEN statement is referred to as a when-expression. In this example, the select-expression (AgeGroup) is compared to each of the when-expressions. If the comparison is true, the statement following the when-expression is executed and control skips to the end of the SELECT group. If the comparison is false, the next when-expression is compared to the select-expression. If none of the comparisons is evaluated to be true, the expression following the OTHERWISE statement is executed. As you can see in this example, the otherwise-expression can be a null statement. It is good to include an OTHERWISE statement because the program will terminate if you omit it and none of the preceding comparisons is true. You can place more than one value in the when-expression, like this:
select (AgeGroup);
when (1) Limit = 110;
when (2) Limit = 120;
when (3,5) Limit = 130;
otherwise;
end;
In this example, AgeGroup values of 3 or 5 will set a Limit equal to 130. To help clarify this concept, let’s follow some scenarios:
Recommended by LinkedIn
If AgeGroup is equal to 1, the Limit will be 110. If Agegroup is equal to 3, the Limit will be equal to 130. If AgeGroup is equal to 4, the Limit will be a missing value (because it is set to a missing value in the PDV at each iteration of the DATA step and it is never assigned a value). If you do not supply a select-expression, each WHEN statement is evaluated to determine if the when-expression is true or false.
data daily.conditional_select;
set daily.conditional;
select;
when (missing(Age)) AgeGroup = .;
when (Age lt 20) AgeGroup = 1;
when (Age lt 40) AgeGroup = 2;
when (Age lt 60) AgeGroup = 3;
when (Age ge 60) Agegroup = 4;
otherwise;
end;
run;
title "Listing of CONDITIONAL using SELECT";
proc print data=conditional_select noobs;
run;
Notice that there is no select expression in this SELECT statement. Each when-expression is evaluated and, if true, the statement following the expression is executed.
Using Boolean Logic (AND, OR, and NOT Operators)
You can combine various logical operators (also known as Boolean operators) to form fairly complex statements. As an example, the data set SASHELP.HEART contains information on status, age weight, height, smoking_status, etc. A program to list all patients whose status is alive, have either high cholesterol or high blood pressure or are overweight, or are either very heavy smokers or heavy smokers:
title "Example of Boolean Expressions";
proc print data= sashelp.heart;
where Status = 'Alive' and
(Chol_Status = 'High' or
BP_Status = 'High' or
Weight_Status = 'Overweight' or
Smoking_Status = 'Heavy(16-25)' or Smoking_Status = 'Very Heavy (>25)') ;
var Status Sex Chol_Status BP_Status Smoking_Status;
run;
Notice the parentheses around the two statements separated by OR. The AND operator has precedence over the OR operator.
All these people having any of the staus on the higher side need to work on their lifestyle to improve their health.
The WHERE Statement
If you are reading data from a SAS data set, you can use a WHERE statement also to subset your data. For example, you can use WHERE statement to subset female data from conditional data set :
data daily.females_where;
set daily.conditional;
where Gender eq 'F';
run;
title "Listing of FEMALES using WHERE";
proc print data= daily.females_where noobs;
run;
In this example, you could use either a WHERE or a subsetting IF statement. There are sometimes advantages to using a WHERE statement instead of a subsetting IF statement. You have a larger choice of operators that can be used with a WHERE statement and, if the input data set is indexed, the WHERE statement might be more efficient.
You may also use a WHERE statement in a SAS procedure to subset the data being processed. IF statements are not allowed inside SAS procedures.
NOTE 1: A complete list of the logical comparison operators is displayed in the following table.
NOTE 2: The table here lists some of the useful operators that you can use with a WHERE statement.
Important points to note:
Now let's solve some challenging problems as well to understand the concept:
Challenge Problem 1:
Using the Sales.xls data:
Challenge Problem 2:
Using the Blood.txt data, create a new, temporary SAS data set containing all the variables in Blood plus a new variable called CholGroup. Define this new variable as follows:
Low: Low – 110 ; Medium: 111 – 140 ; High : 141 – High
Do try them out and happy learning!