Performing Conditional Processing

Performing Conditional Processing

This article describes the tools that allow programs to “make decisions” based on data values. For example, you may want to read a value of Age and create a variable that represents age groups. You may want to determine if values for a particular variable are within predefined limits. Programs that perform any of these operations require conditional processing—the ability to make logical decisions based on data values.

The IF and ELSE IF Statements

Two of the basic tools for conditional processing are the IF and ELSE IF statements. To understand how these statements work, suppose you have collected data on a group of students based on the following variables:

  • Age (in years)
  • Gender (recorded as M or F)
  • Midterm (grade on the midterm exam)
  • Quiz (quiz grade from F to A+)
  • FinalExam (grade on the final exam)

The first task is to create a new variable that represents age groups. To do this we run the following code:

libname daily "/home/u41108928/Ron Cody Data";

data daily.conditional; 
length Gender $ 1 
Quiz $ 2; 
input Age Gender Midterm Quiz FinalExam; 
if missing(Age) then AgeGroup = .; 
else if Age lt 20 then AgeGroup = 1; 
else if Age lt 40 then AgeGroup = 2; 
else if Age lt 60 then AgeGroup = 3; 
else if Age ge 60 then AgeGroup = 4; 
datalines;
 21 M 80 B- 82 
 . F 90 A 93 
 35 M 87 B+ 85
 48 F . . 76 
 59 F 95 A+ 97 
 15 M 88 . 93 
 67 F 97 A 91 
 . M 62 F 67 
 35 F 77 C- 77 
 49 M 59 C 81 
; 


title "Listing of CONDITIONAL"; 
proc print data= daily.conditional noobs; 
run;
        

The output will be :

No alt text provided for this image

When you write a program like this, you need to make sure your logic is correct. For example, what if a person is 25 years old? The first IF statement is false because Age is not missing. The next ELSE IF statement is evaluated and found to be false as well. Finally, the third IF statement is evaluated and AgeGroup is set equal to 2. Because this IF statement is true, all the remaining ELSE IF statements are skipped. 

If you are working with very large data sets and want to squeeze every last drop out of the efficiency tank, you should place the IF statements in order, from the ones most likely to have a true condition to the ones least likely to have a true condition. This increases efficiency because SAS skips testing all the ELSE conditions when a previous IF condition is true. 

If you want to take a subset of data from a raw data file or an existing SAS data set, you can use a special form of an IF statement called a subsetting IF

data daily.females; 
set daily.conditional;
if Gender eq 'F'; 
run; 
 
title "Listing of FEMALES"; 
 
proc print data=daily.females noobs;          

run;        
No alt text provided for this image

The IN Operator 

If you want to test if a value is one of several possible choices, you can use multiple OR statements, like this: 

if Quiz = 'A+' or Quiz = 'A' or Quiz = 'B+' or Quiz = 'B'
then QuizRange = 1;

else if Quiz = 'B-' or Quiz = 'C+' or Quiz = 'C'
then QuizRange = 2;


else if not missing(Quiz) then QuizRange = 3;         

These statements can be simplified by using the IN operator, like this: 

if Quiz in ('A+' 'A' 'B+' 'B') then QuizRange = 1;

else if Quiz in ('B-' 'C+' 'C') then QuizRange = 2;


else if not missing(Quiz) then QuizRange = 3;         

The list of values in parentheses following the IN operator can be separated by blanks or commas. The first line could also be written like this: 

if Quiz in ('A+','A','B+','B') then QuizRange = 1; 
        

You can also use the IN operator with numeric variables. For example, if you had a numeric variable called Subject (stored as a numeric value) and you wanted to list observations for Subject numbers 10, 22, 25, and 33, the following WHERE statement could be used:

where Subject in (10 22 25 33); 
        

As with the example using character values, you may separate the values with spaces or commas. You can also specify a range of numeric values, using a colon to separate values in the list. For example, to list observations where Subject is a 10, 22–25, or 30, you can write:

where Subject in (10,22:25,30); 
        

Using a SELECT Statement for Logical Tests 

A SELECT statement provides an alternative to a series of IF and ELSE IF statements. Here is one way to use a SELECT statement: 

select (AgeGroup);
when (1) Limit = 110;
when (2) Limit = 120;
when (3) Limit = 130;
otherwise;


end;         

The expression following the SELECT statement is referred to as a select-expression; the expression following a WHEN statement is referred to as a when-expression. In this example, the select-expression (AgeGroup) is compared to each of the when-expressions. If the comparison is true, the statement following the when-expression is executed and control skips to the end of the SELECT group. If the comparison is false, the next when-expression is compared to the select-expression. If none of the comparisons is evaluated to be true, the expression following the OTHERWISE statement is executed. As you can see in this example, the otherwise-expression can be a null statement. It is good to include an OTHERWISE statement because the program will terminate if you omit it and none of the preceding comparisons is true. You can place more than one value in the when-expression, like this:

select (AgeGroup); 

 when (1) Limit = 110; 

 when (2) Limit = 120; 

 when (3,5) Limit = 130; 

 otherwise; 


end;         

In this example, AgeGroup values of 3 or 5 will set a Limit equal to 130. To help clarify this concept, let’s follow some scenarios:

If AgeGroup is equal to 1, the Limit will be 110. If Agegroup is equal to 3, the Limit will be equal to 130. If AgeGroup is equal to 4, the Limit will be a missing value (because it is set to a missing value in the PDV at each iteration of the DATA step and it is never assigned a value). If you do not supply a select-expression, each WHEN statement is evaluated to determine if the when-expression is true or false. 

data daily.conditional_select; 
set daily.conditional;
select; 
 when (missing(Age)) AgeGroup = .; 
 when (Age lt 20) AgeGroup = 1; 
 when (Age lt 40) AgeGroup = 2; 
 when (Age lt 60) AgeGroup = 3; 
 when (Age ge 60) Agegroup = 4; 
otherwise; 
end; 
run; 
 
title "Listing of CONDITIONAL using SELECT"; 
 
proc print data=conditional_select noobs;         

run;        

Notice that there is no select expression in this SELECT statement. Each when-expression is evaluated and, if true, the statement following the expression is executed. 

Using Boolean Logic (AND, OR, and NOT Operators) 

You can combine various logical operators (also known as Boolean operators) to form fairly complex statements. As an example, the data set SASHELP.HEART contains information on status, age weight, height, smoking_status, etc. A program to list all patients whose status is alive, have either high cholesterol or high blood pressure or are overweight, or are either very heavy smokers or heavy smokers: 

title "Example of Boolean Expressions";
 
proc print data= sashelp.heart;
   where Status = 'Alive' and
          (Chol_Status = 'High' or 
           BP_Status = 'High'  or 
           Weight_Status = 'Overweight' or 
           Smoking_Status = 'Heavy(16-25)' or Smoking_Status = 'Very Heavy (>25)') ;
    var  Status Sex Chol_Status BP_Status Smoking_Status;

run;        

Notice the parentheses around the two statements separated by OR. The AND operator has precedence over the OR operator. 

No alt text provided for this image

All these people having any of the staus on the higher side need to work on their lifestyle to improve their health.

The WHERE Statement

If you are reading data from a SAS data set, you can use a WHERE statement also to subset your data. For example, you can use WHERE statement to subset female data from conditional data set :

data daily.females_where; 
 set daily.conditional; 
 where Gender eq 'F'; 
run;

title "Listing of FEMALES using WHERE"; 
 
proc print data= daily.females_where noobs;          

run;        

In this example, you could use either a WHERE or a subsetting IF statement. There are sometimes advantages to using a WHERE statement instead of a subsetting IF statement. You have a larger choice of operators that can be used with a WHERE statement and, if the input data set is indexed, the WHERE statement might be more efficient.

You may also use a WHERE statement in a SAS procedure to subset the data being processed. IF statements are not allowed inside SAS procedures. 

NOTE 1: A complete list of the logical comparison operators is displayed in the following table.  

No alt text provided for this image

NOTE 2: The table here lists some of the useful operators that you can use with a WHERE statement.

No alt text provided for this image

Important points to note:

  1. The LIKE expression uses two wildcard operators. The underscore (_) is a placeholder; enter as many underscores as you need to stand for the same number of characters. The percent (%) matches nothing or a string of any length. 
  2. The IS NULL or IS MISSING expression matches a character or a numeric missing value. 
  3. The BETWEEN AND expression matches all the values greater than or equal to the first value and less than or equal to the second value. This works with character as well as numeric variables. 
  4. The CONTAINS expression matches any character value containing the given string.  

Now let's solve some challenging problems as well to understand the concept:

Challenge Problem 1:

Using the Sales.xls data:

  1. List the observations for employee numbers (EmpID) 9888 and 0177. 
  2. Create a new, temporary SAS data set containing Region and TotalSales and a new variable called Weight with values of 1.5 for the North Region, 1.7 for the South Region, and 2.0 for the West and East Regions. 
  3. List all the observations where Region is North and Quantity is less than 60. Include in this list any observations where the customer name (Customer) is Pet's are Us.


Challenge Problem 2:

Using the Blood.txt data, create a new, temporary SAS data set containing all the variables in Blood plus a new variable called CholGroup. Define this new variable as follows:

Low: Low – 110 ; Medium: 111 – 140 ; High : 141 – High 


Do try them out and happy learning!

To view or add a comment, sign in

More articles by Sunayana Pati

  • Working with Dates: Part 2

    In continuation from the previous article on working with dates, today's article will cover various date functions in…

  • Working with Dates: Part 1

    Most data sets contain date information, such as date of birth or transaction date. SAS can read dates in almost any…

  • Performing Iterative Processing: Part 2

    In continuation from the previous article on iterative processing, today's article will cover DO UNTIL statements and…

  • Performing Iterative Processing

    Many programming tasks require that blocks of code be run more than once. SAS provides several ways to accomplish this.

    2 Comments
  • Reding data from Excel files

    It is quite common to be given a Microsoft Office Excel spreadsheet as your data source. Luckily, SAS has several…

  • Storing user-defined formats permanently

    If you have user-defined formats with permanent SAS data sets, it is important to make your formats permanent also…

  • Formats and Labels in SAS

    LABELS If you are using SAS to produce listings and reports for others, you will want to make the output more readable…

  • SAS Libraries

    When you write a DATA statement such as data test; SAS creates a temporary SAS data set called test. When you close…

  • Reading data using formatted inputs

    Till now we have been reading data from external files; we used DATALINES to place data directly into the SAS program…

  • Reading data from Fixed-columns

    You know, how to read data from external files. Today I discuss how to read data from fixed columns.

Others also viewed

Explore content categories