Performing Iterative Processing: Part 2
In continuation from the previous article on iterative processing, today's article will cover DO UNTIL statements and DO WHILE statements.
Instead of choosing a stopping value for an iterative DO loop, you can stop a loop when a condition is met or while a condition is true. Enter the DO UNTIL and DO WHILE statements.
DO UNTIL Statements
Let’s revisit the compound interest problem from the previous article. Instead of asking how much money you have after x years, you want to know how many years you need to keep your $100 in the bank at 3.75% interest to double your money. Let us look at the program:
data double;
Interest = .0375;
Total = 100;
do until (Total ge 200);
Year + 1;
Total = Total + Interest*Total;
output;
end;
format Total dollar10.2;
run;
title "Listing of DOUBLE";
proc print data=double noobs;
run;
The condition is placed in parentheses following the keyword UNTIL. In this example, the loop continues to repeat until the value of Total is greater than or equal to 200. Here is the output:
An important point to remember about DO UNTIL is that the condition, placed in parentheses after the keyword UNTIL is tested at the bottom of the loop. Therefore, a DO UNTIL loop always executes at least once.
To make this clear, suppose you started with $300. What happens when you run the program?
data double;
Interest = .0375;
Total = 300;
do until (Total gt 200);
Year + 1;
Total = Total + Interest*Total;
output;
end;
format Total dollar10.2;
run;
The condition is true even before the loop starts, but because the condition is tested at the bottom of the loop, this program outputs one observation (as shown here):
DO WHILE Statements
An alternative to DO UNTIL is DO WHILE. As you might expect, a DO WHILE loop iterates as long as the condition following WHILE is true. There is another difference between DO WHILE and DO UNTIL—the WHILE condition is tested at the top of the loop rather than at the bottom. So, unlike a DO UNTIL block that always iterates at least once, a DO WHILE block does not execute even once if the condition is false. You can rewrite the above program using a DO WHILE statement, like this:
data double;
Interest = .0375;
Total = 100;
do while (Total le 200);
Year + 1;
Total = Total + Interest*Total;
output;
end;
format Total dollar10.2;
run;
proc print data=double noobs;
title "Listing of DOUBLE";
run;
The block of code between the DO WHILE and END statements executes as long as Total is less than or equal to 200. Output from this program is identical to the output from the above program.
To reinforce the idea that DO WHILE conditions are tested at the top of the loop, look at this program:
data double;
Interest = .0375;
Total = 300;
do while (Total lt 200);
Year + 1;
Total = Total + Interest*Total;
output;
end;
format Total dollar10.2;
run;
Because the WHILE condition is never true, the statements inside the DO WHILE block never execute and the data set Double has no observations.
A Caution When Using DO UNTIL Statements
It is very important that the condition you place on a DO UNTIL statement becomes true at some point. For example, if you change the DO UNTIL statement to read as follows, the condition is never true and you have what is called an infinite loop:
do until (Total eq 200);
Depending on whether or not you are paying for your computer time, this could be a bad (expensive) thing. The lesson here is to be very careful when using a DO UNTIL statement: make sure the condition you specify eventually returns a true value.
data double;
Interest = .0375;
Total = 100;
do Year = 1 to 100 until (Total gt 200);
Total = Total + Interest*Total;
output;
end;
format Total dollar10.2;
run;
There are two advantages to this structure: first, even if the UNTIL condition never becomes true, the loop ends when Year reaches 100, and second, you don’t have to assign a value to Year inside the loop.
LEAVE and CONTINUE Statements
The LEAVE statement inside a DO loop shifts control to the statement following the END statement at the bottom of the loop. The CONTINUE statement halts further statements within the DO loop from executing and continues iterations of the loop.
data leave_it;
Interest = .0375;
Total = 100;
do Year = 1 to 100;
Total = Total + Interest*Total;
output;
if Total ge 200 then leave;
end;
format Total dollar10.2;
run;
In this program, the loop continues until the Total is greater than or equal to 200. At this point, the LEAVE statement terminates the loop.
To demonstrate a CONTINUE statement, take a look at the following program:
data continue_on;
Interest = .0375;
Total = 100;
do Year = 1 to 100 until (Total ge 200);
Total = Total + Interest*Total;
if Total le 150 then continue;
output;
end;
format Total dollar10.2;
run;
As long as Total is less than or equal to 150, the CONTINUE statement causes execution to drop to the bottom of the loop (skipping the OUTPUT statement) and the loop continues. When Total is greater than 150, output occurs and the outer loop continues until Total is greater than 200.
Thus, this program prints values of Total greater than 150 until Total reaches or exceeds 200. Here is the output:
As you saw in this article, iterative statements in SAS can make your programs shorter and easier to understand. They also allow you to write DATA steps that generate data for creating tables or plotting functions.
Now let's solve some challenging problems as well to understand the concept:
Challenge Problem 1:
Modify the program here so that each observation contains a subject number (Subj), starting with 1:
data test;
input Score1-Score3;
/* add your line(s) here */
datalines;
90 88 92
75 76 88
88 82 91
72 68 70
;
Challenge Problem 2:
Run the program here to create a temporary SAS data set (MonthSales):
data monthsales;
input month sales @@;
/* add your line(s) here */
datalines;
1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500
9 5100 10 5700 11 6500 12 7500
;
Modify this program so that a new variable, SumSales, representing Sales to date, is added to the data set. Be sure that the missing value for Sales in month 3 does not result in a missing value for SumSales.
Challenge Problem 3:
Count the number of missing values for the variables A, B, and C in the Missing data set. Add the cumulative number of missing values to each observation (use variable names MissA, MissB, and MissC). Use the MISSING function to test for the missing values.
Challenge Problem 4:
Create and print a data set with variables N and LogN, where LogN is the natural log of N (the function is LOG). Use a DO loop to create a table showing values of N and LogN for values of N going from 1 to 20.
Challenge Problem 5:
You have the following seven values for temperatures for each day of the week, starting with Monday: 70, 72, 74, 76, 77, 78, and 85. Create a temporary SAS data set (Temperatures) with a variable (Day) equal to Mon, Tue, Wed, Thu, Fri, Sat, and Sun and a variable called Temp equal to the listed temperature values. Use a DO loop to create the Day variable.
Challenge Problem 6:
You are testing three speed-reading methods (A, B, and C) by randomly assigning 10 subjects to each of the three methods. You are given the results as three lines of reading speeds, each line representing the results from each of the three methods, respectively. Here are the results:
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
350 350 340 290 377 401 380 310 299 399
Create a temporary SAS data set from these three lines of data. Each observation should contain Method (A, B, or C), and Score. There should be 30 observations in this data set. Use a DO loop to create the Method variable and remember to use a single trailing @ in your INPUT statement. Provide a listing of this data set using PROC PRINT.
Challenge Problem 7:
You have daily temperatures for each hour of the day for two cities (Dallas and Houston). The 48 temperature values are strung out in several lines like this:
80 81 82 83 84 84 87 88 89 89
91 93 93 95 96 97 99 95 92 90 88
86 84 80 78 76 77 78
80 81 82 82 86
88 90 92 92 93 96 94 92 90
88 84 82 78 76 74
The first 24 values represent temperatures from Hour 1 to Hour 24 for Dallas and the next 24 values represent temperatures for Hour 1 to Hour 24 for Houston. Using the appropriate DO loops, create a data set (Temperature) with 48 observations, each observation containing the variables City, Hour, and Temp.
Challenge Problem 8:
You place money in a fund that returns a compound interest of 4.25% quarterly. You deposit $1,000 every year. How many years will it take to reach $30,000? Do not use compound interest formulas. Rather, use “brute force” methods with DO WHILE or DO UNTIL statements to solve this problem.
Do try them out and happy learning!