Master SAS in 30 days!

A jargon-free, easy-to-learn SAS base course that is tailor-made for students with no prior knowledge of SAS.

The Complete Guide to Do-loop, Do-while and Do-Until

Do you often need to execute the same statements over and over again in your SAS programs? Are you looking to become a faster, more efficient SAS programmer?
 
Iterative DO loops, DO UNTIL and DO WHILE provide a wide variety of ways to perform repeated actions on your SAS datasets over and over again without having to write duplicate code or execute the same statements multiple times manually.
 
In this article, we will discuss differences between iterative DO loops and conditional DO loops; providing examples of how to use each one along the way. SAS Arrays will also be discussed for handling the most complex form of DO Loops.
 
The main topics discussed will include:
  1. Iterative Do Loops
  2. Conditional Do Loops (DO UNTIL vs DO WHILE)
  3. Combining Conditional and Iterative Do Loops
  4. Arrays
    1. Iterative Do Loops
    2. DO OVER
 

Software

Before we continue, make sure you have access to SAS Studio. It's free!

Data Sets

This article uses the following datasets found in the SASHELP library: FISH, BASEBALL and PRICEDATA

Iterative DO Loops

Iterative DO loops are the simplest form of DO loops that can be executed within a SAS Data Step. The actions of an iterative DO loop are unconditional, meaning that if you define a loop to execute 50 times, it will execute 50 times without stopping (unless an error occurs during processing).
 
Basic iterative DO loops are most useful for incremental counting or repetitive calculation exercises. Let’s look at a couple examples where a basic iterative DO loop can be used.

Example 1 – Incremental Bank Account Balance

In this example, you would like to calculate what your account balance will be after your next 6 pay cheques of $1,000 are deposited to your account.
 
Assume that your starting balance is $2,000, so we set balance = 2000 in the program below.
 
The next step is to specify the index or counter variable. A small “i” is the most commonly used index variable, but any variable name that follows the SAS variable naming convention rules can be used.
 
Since we want to iterate 6 times to determine the balance after 6 pay cheques in this example, the index variable is set to iterate from 1 to 6. By default, the index variable will iterate by 1 until it reaches the maximum value specified in the do loop statement (which is 6 in this example):

do i = 1 to 6;

Once the loop has been setup, you need to specify what should happen during each iteration. In this case, we would like to take the current account balance and add 1000 during each iteration:

balance + 1000;

The last step is to close off the DO loop with an END statement. Here is the complete syntax:

data account; 
 balance = 2000;
  do i = 1 to 6;
  balance + 1000;
 end; 
run;  

In the output dataset shown below, you will see two variables. The first variable is “balance”, with your account balance after 6 pay cheques (6 iterations) and the second variable is “I”, which shows the final value of i after all iterations are complete:

In many cases, you will want to drop the index variable i using a DROP statement at the end of your Data Step code, but for the purposes of the example we left the i variable in our WORK.ACCOUNT dataset.

Example 2 – Generating Records with an Incremental DO Loop

In addition to calculating a final number, a DO loop can be used to generate records, storing the result of each iteration in a new record.
 
In this example, you would like to know what the balance of your mortgage will be after 1 year of bi-weekly payments of $500. You would also like to see the balance of your mortgage after each bi-weekly payment is made. For simplicity sake in this example, we are ignoring any interest that would typically be factored into a mortgage calculation.
 
To create a dataset that shows the balance of your mortgage after each bi-weekly payment, we can use an iterative DO loop.
 
First, we create a new variable, balance, and set it equal to 300000 (assuming the balance of your mortgage is $300,000). Next, we would like to loop through 52 weeks (1 year) with bi-weekly payments. So, in SAS terms, we will iterate the index variable “i" from 1 to 52 by 2, since we want bi-weekly payments spread across 52 weeks.
 
As we would like to keep a record for each iteration in our dataset, we also include an OUTPUT statement within the DO loop before closing it off with an end statement:

data mortgage;
 balance = 300000; 
 do i = 1 to 52 by 2;
  balance = balance - 500;
  output;
 end; 
run;  

As you can see in the output dataset shown partially below, the balance at each bi-weekly interval is calculated and stored in a record, along with the current value of i:

Example 3 – Nested DO Loops

When a DO loop is iterating within another DO loop, it is known as a nested DO loop. Nested DO loops are easier to understand with a simple example.
 
In this example, we will  create 3 records for each value of ID, where ID is a number from 1 to 4. For each ID, there should be a record for each SUBID value, ranging from 1-3. In total, we would like to have 12 records, with 3 records for each value of ID.
 
To do this, first start by looping through the index variable i from 1 to 4. Within this loop, we set ID = i so that the ID variable will take the value of i after each iteration

do i = 1 to 4;
ID = i;

The second DO loop uses the index variable j, starting at 1 and ending at 3, where SUBID takes the values of j after each iteration:

do j = 1 to 3;
SUBID = j;

The OUTPUT statement is included so that a new record is created each time there is an iteration. Both loops are closed off with an END statement. The values of i and j are then dropped from the final WORK.IDS dataset.
 
Here is the complete syntax:

data ids;
 do i = 1 to 4;
  ID = i;
 
  do j = 1 to 3;
   SUBID = j;
   output;
  end;
 
  end;
 
  drop i j;
run;   

Running the above code will produce the desired output dataset with a total of 2 variables (ID and SUBID) and 12 rows (3 records for each of the 4 possible values of ID), as shown here:

Do you have a hard time learning SAS?

Take our Practical SAS Training Course for Absolute Beginners and learn how to write your first SAS program!

Conditional DO Loops

The other type of DO Loops that you can run in a SAS Data Step are conditional DO Loops. There are two forms of conditional DO Loops, DO UNTIL loops and DO WHILE loops. DO UNTIL loops continue executing until the condition you have specified becomes true. DO WHILE loops continue executing while the condition you have specified remains true.
 
Let’s start by looking at a simple example for both DO UNTIL and DO WHILE.
 
Example 1 – Calculate Number of Payments for Car Loan using DO UNTIL 

Using a DO UNTIL loop, you can easily calculate the number of payments it would take to payoff a $30,000 car loan.
 
To start, we set the loan=30000 for the $30,000 car loan and payments equal to 0 since no payments have been made yet. Next a DO UNTIL statement is used followed by the condition that needs to be met before the loop will stop executing. In this case, we would like the loop to keep executing until the value for loan is equal to zero, so we enclose loan = 0 in parentheses immediately after the DO UNTIL to tell SAS that we would like to the loop to run until the value of loan reaches exactly zero.
 
Inside the DO UNTIL loop, we subtract 500 from the value of loan since we assume a $500 payment is being made each month. The value for payments is then incremented by 1 so that we can count the number of payments while the loan is being paid off. Lastly, the DO UNTIL loop is closed off with an END statement.

data carloan_until;
 loan = 30000;
 payments = 0;
 
 do until (loan = 0);
  loan = loan - 500;
  payments = payments + 1;
 end;
run;

As you can see in the output dataset, the loan reaches zero after 60 payments:

Example 2 – Calculate Number of Payments for Car Loan using DO WHILE 

Although slightly different, a DO WHILE loop can also be used to obtain the same result. Let’s start by taking the same code we used in the example above, but replacing the DO UNTIL with a DO WHILE:

data carloan_while1;
 loan = 30000;
 payments = 0;
 
 do while (loan = 0);
  loan = loan - 500;
  payments = payments + 1;
 end;
run;   

When you look at the output data (shown below), you will notice that the loop did not iterate at all as the value for loan is still the original amount of 30000 and the payments are equal to zero:

The main difference between the DO WHILE and DO UNTIL loop is that the DO UNTIL expression is evaluated at the bottom of the loop while the DO WHILE is evaluated at the top of the loop. The result is that a DO UNTIL loop will always execute at least once, whereas a DO WHILE may not execute at all if the DO WHILE expression is not true from the start.
 
In this example, we told SAS to execute the loop while loan is equal to zero. Since loan is equal to 30000 at the start of the program, the loop does not execute at all the loan and payment values remain unchanged.
 
For a DO WHILE loop, the loop will continue to execute while the statement is true, so we need to modify the condition so that SAS knows to keep running the program WHILE the loan balance is still greater than zero. So, we simply replace the condition within the parentheses from loan=0 to loan > 0:

data carloan_while;
 loan = 30000;
 payments = 0;
 
 do while(loan > 0);
 
  loan = loan - 500;
  payments = payments + 1;
 end;
run;

As you can see in the output shown below, the results are now the same as the DO UNTIL results; the loan has reached a value of 0 after 60 payments. 

Combining Conditional and Iterative DO Loops

 
DO loops can be further enhanced by combining both conditional and iterative DO loops into a single loop.
 
Example 1 – Determine Number of Years to save $20,000 in a College Fund 

In this example, you would like to how much money you need to set aside monthly and how many years it would take to save $20,000 for a college fund. You also don’t want the number of years to exceed 18 since your child plans to go to college at 18 years old. 
 
To set up the SAS program, we start by setting the value of fund=0 since you don’t have any money in the savings account yet. The dollar8.2 format is then applied to the fund variable so that the output values are formatted as currency.
 
For the DO loop, we start by setting the index variable i to start at 1 and end after 18, since we don’t want the loop to go beyond 18 years. Since we are only aiming to save $20,000 we add the until condition:

do i = 1 to 18 until(fund >= 20000);

With these conditions in place, SAS will execute the loop until either 18 years has been reached or the fund exceeds $20,000.
 
You decide to set aside $1000 a year in a savings account that provides a 1% annual interest rate. To account for this, the value of “fund” is incremented by 1000 with each iteration, and then multiplied by 1.01 to add 1% in interest to the value of the fund. The variable “Years” is set equal to the index variable i so we can easily track how many years it will take to reach $20,000.

data college_fund;
 
 fund = 0;
 format fund dollar8.2;
 
 do i = 1 to 18 until(fund >= 20000);
  fund = (fund + 1000) * 1.01;
  years = i;
    output;
 end;
 
 drop i;
 
run;  

As you can see in the output shown below, by setting aside $1000 a year in a savings account that earns 1% interest annually, you would be just shy of the $20,000 mark after 18 years:
 
By modifying the amount of the annual deposit to 1200, we can see a different result:
 

data college_fund;
 
 fund = 0;
 format fund dollar8.2;
 
 do i = 1 to 18 until(fund >= 20000);
  fund = (fund + 1200) * 1.01;
  years = i;
    output;
 end;
 
 drop i;
 
run;  

As you can see the updated dataset shown below, the loop now stopped executing after 16 iterations, since 16 years was sufficient for the college fund to exceed the $20,000 threshold.

Become a Certified SAS Specialist

Get access to two SAS base certification prep courses and 150+ practice exercises

Using Arrays with DO Loops


SAS Array processing is a method by which we can perform the same action on more than one variable at a time. Array processing can be helpful for a variety of tasks such as performing repetitive calculations on multiple variables or creating multiple variables with the same attributes.
 
Let’s look at a few examples where arrays can be useful.
 
Example 1 – Using an Array and DO Loop to Convert Units in Multiple Variables 

The SASHELP.FISH dataset contains multiple recorded lengths and a width measurement for a variety of fish. The lengths and width are recorded in centimeters and stored in 4 different variables, Length1, Length2, Length3 and Width.
 
For this example you would like to convert the lengths and width to inches from centimeters and store the converted values in 4 new variables names Length1_in, Length2_in, Length3_in and Width_in. To convert centimeters to Inches, you multiply by 0.3937. In the code below we also add a format statement to only display 2 decimal places and then drop the original Length1, Length2, Length3 and width variables:

data fish_inches;
 set sashelp.fish;
 
 length1_in = length1 * 0.3937;
 length2_in = length2 * 0.3937;
 length3_in = length3 * 0.3937;
 width_in = width * 0.3937;
 
 format length1_in length2_in length3_in width_in 6.2;
 
 drop length1 length2 length3 width;
 
run;

As you can see the output dataset shown partially below, you now have 4 new variables in your dataset, length1_in, length2_in, length3_in and width_in:

While this Data Step code produces the desired result, having to type out similar code to perform the same calculation on 4 similar variables is not very efficient programming. This becomes especially true when you need to perform repeated calculations on an even larger number variables.
 
Now, let’s look at how we can enhance this SAS code with an array.
 
First, we need to define an array with an ARRAY statement, the name of the array, and the number of variables in the array . Once the array has been defined, you need to list the variables which will be used in the array.
 
Since in this example we want to create 4 new variables from 4 existing variables, we will need to define 2 arrays.
 
The first array we will call original. After the array name, we enclose a 4 in the parentheses because the array will contains 4 variables. Finally, we list the variables that will go into the array that we want to do our calculation on:

array original(4) length1 length2 length3 width;

The second array we will call inches, enclose a 4 in the parentheses because the array will contain 4 variables and then list the new variable names that we want our calculated values to be stored in: 

array inches(4) length1_in length2_in length3_in width_in;

To understand the different array elements and their associated values at each iteration, you can refer to this table:

Now that our ARRAY statements have been established, we can use a simple iterative DO Loop to loop through all 4 elements of  each array and perform the cm to inches conversion using the following code:

data fish_array_inches;
 set sashelp.fish;
 
 array original(4) length1 length2 length3 width;
 array inches(4) length1_in length2_in length3_in width_in;
 
 do i = 1 to 4;
  inches(i) = original(i) * 0.3937;
 end;
 
 format length1_in length2_in length3_in width_in 6.2;
 
 drop length1 length2 length3 width;
 
run;

As you can see the output dataset shown partially below, the results are the same as the previous example without the array. 

In this example, there are only 4 elements to the array and so the amount of programming reduced is minimal. However, you can see how situations with more than 4 variables would greatly benefit from arrays.

Example 2 – Using an Array and Do Loop to Create Multiple Variables 

Rather than specifying your own variables, you can let SAS create and name variables for you based off an array. The advantage of letting SAS create the variables for you is that they can be easily grouped for further analysis. For example, if SAS creates the variables VAR1 to VAR10, you can use statements such as “var1-var10” in various SAS procedures to reduce the amount of coding you need to do when you need to analyze multiple variables at the same time.
 
Let’s look at a more in-depth example using the SASHELP.BASEBELL dataset. In this example, you would like to calculate the per game average of a number of baseball statistics. You assume that the number of games played in a season is 162 and the BASEBELL dataset represents a single season.
 
To do this, you would like to create a new set of variables that can be easily grouped and further analyzed.
 
We start by defining our first array which contains a list of the original 9 variable names. The array will be called “original”, a 9 is placed inside the parentheses to indicate that we have 9 elements in the array, and then variables that will become part of the array follow:

array original(9) nAtBat nHits nHome nRuns nRBI nBB nOuts nAssts nError;

Next, since we don’t want to define specific variable names for the newly created variables, we specify an array called stats with no variables in it:

array stats(9);

Once the arrays are defined, you can use a DO Loop to iterate from 1 through 9, dividing each element of the array by 162 to get the average statistic per game.
 
To check that the new variables were created, we run a simple PROC CONTENTS after the Data Step on the WORK.BASEBALL_ARRAY dataset. To verify that the array was created properly and demonstrate how to take advantage of the identically prefixed variables, we can also run a PROC MEANS to calculate the mean of each statistic variable using the PROC MEANS code shown in the SAS program below:

data baseball_array;
 set sashelp.baseball;
 
 array original(9) nAtBat nHits nHome nRuns nRBI nBB nOuts nAssts nError;
 array stat(9);
 
 do i = 1 to 9;
  stat(i) = original(i)/162;
 end;
 
run;
 
proc contents data=baseball_array;
run;
 
proc means data=baseball_array mean;
 var stat1-stat9; 
run;

As you can see in the partial PROC CONTENTS output and PROC MEANS output shown below, we now have 9 new variables STAT1-STAT9 that we can conduct further analyses on:

Example 3  - Using DO OVER to Simplify your DO Loops with Arrays 

To alleviate the need for manual counting of array elements, SAS also provides the DO OVER loop option to work with non-indexed arrays. As you start building larger arrays with more and more variables, non-indexed arrays can be a helpful tool. A non-indexed array is similar to a regular indexed array except that you don’t need to specify the number of elements in the array. The DO OVER statement then alleviates the need for you to specify the start and end values for your loop.
 
This example uses the SASHELP.PRICEDATA dataset which contains multiple price variables.
 
Consider a scenario where you would like to increase all your prices (and thus all your price variables) by 3%. Because there are a large number of price variables, this scenario is a good candidate for both arrays and more specifically a non-indexed array.
 
As with indexed arrays, we start with an array statement followed by the array name. However, we don’t specify the number of elements and simply the list variables. In this case, the original price variables are named price1 to price17, so we can simply specify the range to further shorten our code:

array origprice price1-price17;

Next we define the names for the new variables in the array “newprice” which will be adjusted for the 3% increase:

array newprice newprice1-newprice17

After the arrays are defined, a DO OVER statement is used to tell SAS to loop through the entire array. At each iteration, the original price is multiplied by 1.03 to increase it by 3%.
 
Using a PROC PRINT, we can quickly compare two of the newly created newprice# variables with thir matching price# variables to confirm that they were created correctly.
 
Here is the complete syntax:

data pricedata_array;
 set sashelp.pricedata;
 
 array origprice price1-price17;
 array newprice newprice1-newprice17;
 
 do over newprice;
  newprice = origprice * 1.03;
 end;
 
run;   
 
proc print data=pricedata_array(obs=10);
 var price1 newprice1 price2 newprice2;
run;

The PROC PRINT output shown below confirms that the loop and price adjustments worked correctly:

Master SAS in 30 Days

3 2 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
iconmail

Get latest articles from SASCrunch

SAS Base Certification Exam Prep Course

Two Certificate Prep Courses and 300+ Practice Exercises